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This  book  contains  the  collected  and  unified  material  necessary 
for  the  presentation  of  such  branches  of  modern  cybernetics  as  the 
theory  of  electronic  digital  computers,  theory  of  discrete  automata, 
theory  of  discrete  self -organizing  systems,  automation  of  thought 
processes,  theory  of  image  recognition,  etc.  Discussions  are  given 
of  the  fundamentals  of  the  theory  of  boolean  functions,  algorithm 
theory,  principles  of  the  design  of  electronic  digital  computers  and 
universal  algorithmical  languages,  fundamentals  of  perceptron  theory, 
some  theoretical  questions  of  the  theory  of  self -organizing  systems. 

Many  fundamental  results  in  mathematical  logic  and  algorithm 
theory  are  presented  in  summary  form,  without  detailed  proofs,  and 
in  some  cases  without  any  proof. 

The  book  is  intended  for  a  broad  audience  of  mathematicians  and 
scientists  of  many  specialties  who  wish  to  acquaint  themselves  with 
the  problems  of  modern  cybernetics. 
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FOREWORD 


The  objective  of  the  present  book  is  to  acquaint  the  reader  with 
several  new  scientific  directions  which  constitute  the  basis  of  cyber¬ 
netics  in  its  modern  concept.  In  the  most  general  framework  all  these 
trends  can  be  subdivided  into  two  major  groups  -  the  general  theory  of 
information  conversion «  and  the  theory  and  principles  of  the  design  of 
various  kinds  of  information  converters.  However,  the  material  which 
can  be  associated  with  these  major  trends  is  so  extensive  that  it 
could  hardly  be  presented  even  in  summary  form  in  a  single  book. 
Therefore  it  has  been  necessary  to  make  a  selection  of  the  material  in 
accordance  with  some  general  principles. 

The  material  for  the  present  book  has  been  selected  in  accordance 
with  two  basic  principles.  The  first  principle  is  the  requirement  for 
a  sufficiently  rigorous  formulation  of  the  material  to  permit  present¬ 
ing  it  in  the  form  of  a  mathematic  theory  (although  with  the  bent  in 
the  direction  of  practical  simulation  which  is  characteristic  of  cy¬ 
bernetics).  The  second  principle  is  that  the  author  limits  himself,  as 
a  rule,  to  the  digital  methods  of  representing  information  and  the  dig' 
ital  conversion  of  information. 

As  a  result  of  the  selection,  the  book  contains  the  following 
basic  sections:  algorithm  theory  (including  programming  for  general 
purpose  electronic  digital  computers  and  universal  algorithmic  lan¬ 
guages  for  programming),  theory  of  discrete  automata  (including  the 
theory  of  boolean  functions  and  the  concept  of  the  principles  of  the 
design  of  general-purpose  electronic  digital  computers),  theory  of 
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discrete  self -organizing  systems  (including  elements  of  the  theory  of 
optimal  decisions)  and,  finally,  mathematical  logic  (propositional  cal 
cuius,  restricted  predicate  calculus  and  formal  arithmetic),  consid¬ 
ered  as  a  basis  for  the  automation  of  the  process  of  the  design  of 
design  of  deductive  (based  on  a  particular  system  of  axioms)  theories. 

The  degree  of  detail  of  the  presentation  of  the  material  is  deter 
mined  first  of  all  by  the  degree  of  its  novelty.  The  newer  branches, 
related  to  cybernetics  itself,  are  discussed  in  greater  detail,  the 
fundamental  theorems  are  supplied  with  quite  detailed  proofs.  At  the 
same  time,  in  such  branches  as  abstract  algorithm  theory  and  mathemat¬ 
ical  logic,  which  have  developed  within  the  framework  of  traditional 
mathematics,  the  material  is  presented  more  briefly,  proofs,  as  a  rule 
are  omitted. 

The  author  has  attempted,  however,  to  give  an  understanding  of 
the  basic  ideas  and  methods  which  are  used  to  establish  the  validity 
of  such  fundamental,  from  the  point  of  view  of  mathematic  logics,  prop 
ositions  as  the  Qodel  theorem  on  the  incompleteness  of  arithmetic  or 
the  theorems  which  establish  the  algorithmic  insolubility  of  particu¬ 
lar  problems. 

The  book  does  not  pretend  to  replace  specialized  monographs  on 
the  individual  sections  which  are  included  here.  Its  primary  intention 
is  to  aid  a  wide  audience  of  mathematicians  and  engineers  to  master 
that  minimum  of  knowledge  which  is  necessary  for  work  in  the  field  of 
the  theoretical  problems  of  modern  "digital”  cybernetics.  It  is  well 
known  that  the  existence  of  detailed  monographs  on  a  particular  theme 
does  not  always  make  it  possible  for  readers  without  specialized  pre¬ 
paration  to  become  acquainted  with  the  subject.  Convincing  proof  of 
this  is  the  fact  that  in  spite  of  the  existence  of  specialized  mono¬ 
graphs,  such  a  theorem  as  that  of  Godel  mentioned  above,  which  is  of 
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fundamental  importance  for  all  of  mathematics,  remains  unknown  to 
large  numbers  of  mathematicians  except  for  hearsay. 

As  for  the  present  book,  it  presents  to  the  reader  (but  only  in 
one  chapter,  the  fourth),  the  knowledge  of  only  those  elements  of  math¬ 
ematical  analysis  and  probability  theory  which  are  known  to  practical¬ 
ly  every  engineer,  without  mentioning  mathematicians.  The  less  widely 
known  mathematical  results  necessary  for  the  understanding  of  the  main 
content  of  the  book  are  included  as  supplementary  material.  An  example 
of  this  sort  of  supplementary  material  might  be  the  series  of  proposi¬ 
tions  of  probability  theory  presented  in  Chapter  4,  §2. 

In  case  the  reader  wishes  to  extend  his  knowledge  in  a  particular 
area  or  become  acquainted  with  the  detailed  proofs  of  those  proposi¬ 
tions  which,  although  included  in  the  book,  are  not  proved  in  detail, 
we  shall  make  a  summary  of  the  contents  of  the  book  with  an  indication 
of  the  specialized  monographs  (in  Russian)  pertaining  to  the  individu¬ 
al  sections.  Unfortunately,  this  sort  of  monograph  cannot  be  found  per¬ 
taining  to  all  the  sections  of  the  book. 

The  first  chapter  presents  a  description  of  the  basic  theoretical 
universal  algorithmic  systems  (normal  Markov  algorithms,  the  Kolmogorov- 
Uspenskiy  algorithmic  system,  recursive  functions,  the  Post  algorithms, 
and  the  Turing  machine).  Also  presented  are  the  basic  principles  of 
the  proofs  of  the  algorithmic  insolubility  of  certain  very  simple  mass 
prob lems . 

At  the  present  time  there  is  no  unifying  monograph  available  on 
the  intire  theory  of  algorithms  as  a  whole.  Moreover,  not  all  the  ques¬ 
tions  mentioned  above  are  covered  in  any  detail  in  the  monographic  lit¬ 
erature.  Among  the  principal  monographs  on  the  individual  algorithmic 
systems  we  might  mention  the  following:  on  the  theory  of  normal  algo¬ 
rithms,  Theory  of  Algorithms,  A. A.  Markov  (Ref  53);  on  the  theory  of 
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recursive  functions  and  Turing  machines.  Introduction  to  Metamathemat¬ 
ics,  S.  C.  Kleene  (Ref  42)  and  Course  on  Computable  Functions,  V. A.  Us- 
penskiy  (Ref  7 6). 

The  theory  of  boolean  functions  and  its  applications  to  the  theo¬ 
ry  of  discrete  automata  circuits  are  presented  in  the  second  chapter. 
These  questions  are  discussed  in  greater  detail  in  the  mor.ograph  of 
V. M.  Glushkov,  Synthesis  of  Digital  Automata  (Ref  26). 

In  addition,  the  second  chapter  covers  the  fundamentals  of  propo¬ 
sitional  theory.  More  detail  on  propositional  calculus  can  be  found, 
for  example,  in  the  monograph  of  P. S.  Novikova,  Elements  of  Mathemati¬ 
cal  Logic  (Ref  6l). 

The  third  chapter  is  devoted  to  the  abstract  and  structural  theo¬ 
ry  of  discrete  (finite)  automata.  The  questions  relating  to  this  sub¬ 
ject  are  considered  in  more  detail  in  the  monograph  of  Glushkov 
mentioned  above.  These  questions  are  covered  from  somewhat  different 
positions  in  the  monograph  of  N.Ye.  Kobrinskiy  and  V.A.  Trakhtenbrot, 
Introduction  to  the  Theory  of  Finite  Automata  (Ref  47). 

The  fundamentals  of  the  theory  of  discrete  self -organizing  sys¬ 
tems  are  presented  in  the  fourth  chapter.  A  definition  is  given  of  the 
quantitative  measure  of  self -organization  and  self -learning,  a  study 
is  made  of  the  behaviour  of  random  automata  and  automata  operating  in 
conditions  of  random  external  Inputs.  Special  attention  is  devoted  to 
the  problem  of  the  recognition  of  images  and  the  theory  of  one  class 
of  devices  (the  so-called  a-perceptron)  intended  for  the  resolution  of 
thit  problem.  Some  questions  of  the  simulation  of  conditioned  reflexes 
are  considered,  and  also  questions  of  the  teaching  of  meaning  recogni¬ 
tion  and  the  generation  of  new  concepts.  At  the  end  of  the  chapter,  in 
connection  with  the  idea  of  self -ad Justment  and  extremal  regulation, 
desorptions  are  given  of  several  general  methods  for  the  solution  of 
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extremal  problems  (the  method  of  steepest  descent  and  its  refinement, 
the  simplex  method  of  solution  of  the  problems  of  linear  programming 
and  the  so-called  method  of  sequential  analysis  of  variants  for  the 
solution  of  the  problems  of  dynamic  programming). 

So  far  no  unifying  monograph  is  avialable  on  the  material  of  the 
fourth  chapter.  Moreover,  almost  all  the  questions  discussed  in  this 
chapter  (with  the  exception  of  the  method  of  steepest  descent  and  the 
simplex  method)  have  not  yet  been  covered  in  th^  monographic  litera¬ 
ture.  Several  questions  allied  with  those  considered  in  this  chapter 
(but  not  completely  identical  to  them)  are  covered  in  Neurodynamics , 

P.  Rosenblatt  which  has  not  yet  been  translated  into  Russian.  A  large 
number  of  monographs  is  devoted  to  the  methods  of  solution  of  experi¬ 
mental  problems  (with  the  exception  of  the  method  of  sequential  analy¬ 
sis  of  variants).  However,  we  shall  not  list  them  here  since  these 
questions  have  no  direct  relation  to  the  primary  theme  of  the  present 
book. 

The  fifth  chapter  covers  the  basic  principles  of  the  design  of 
the  general-purpose  electronic  digital  computers  and  the  programming 
for  these  machines.  So  many  monographs  have  been  devoted  to  this  ques¬ 
tion  that  it  would  be  very  difficult  to  list  them  all.  In  particular, 
we  might  cite  on  the  subject  of  programming  the  monograph  of  B. V. 
Gnedenko,  V.  S.  Korolyuk  and  Ye.  L.  Yushchenko,  Elements  of  Programming 
(Ref  31).  As  for  the  principles  of  computer  design,  in  spite  of  the 
existence  of  many  good  specialized  monographs  on  this  question,  a  de¬ 
tailed  presentation  of  the  material  in  the  framework  we  need  does  not 
exist ;  the  principles  of  the  design  of  the  electronic  digital  compu¬ 
ters  are  presented,  as  a  rule,  in  isolation  from  the  general  theory  of 
algorithms. 
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In  addition,  the  fifth  chapter  presents  a  detailed  description  of 
the  universal  algorithmic  language  ALGOL-60  and  gives  examples  of 
ALGOL  programming  of  various  problems,  primarily  from  the  theory  of 
self -organizing  systems.  In  particular,  a  discussion  is  given  of  the 
question  of  the  programming  of  the  perceptron  learning  process  and  of 
a  simplified  model  of  the  process  of  biological  evolution.  Again,  on 
this  question  there  is  little  information  in  the  monographic  litera¬ 
ture:  Report  on  the  Algorithmic  Language  ALGOL-60  (edited  by  P.  Naur), 
published  by  the  Computer  Center  of  the  USSR  Academy  of  Sciences  (Mos¬ 
cow,  i960)  is  of  a  reference  nature  and  not  suitable  for  paractical  in 
struction  on  the  ALGOL  language. 

In  the  last  (sixth)  chapter  there  is  given  a  summary  exposition 
of  the  fundamentals  of  the  restricted  predicate  calculus  (includinr 
the  formal  system  of  Gentzen)  and  of  formal  arithmetic  (including  the 
Godel  theory  on  arithmetic  incompleteness).  Detailed  proofs  of  the 
propositions  presented  can  be  found  in  the  previously  cited  monographs 
of  Kleene  and  Novikov.  This  chapter  also  contains  elements  of  the  auto 
mation  of  proofs  and  formulations  of  theorems  in  deductive  theories. 
The  questions  touched  on  here  have  not  yet  been  covered  in  the  mono¬ 
graphic  literature. 

As  indicated  by  the  list  of  the  material  presented  in  the  book, 
several  interesting  branches  of  modern  cybernetics  are  not  Included  in 
the  book.  Considering  the  criteria  mentioned  previously  for  the  selec¬ 
tion  of  material,  we  could,  for  example,  include  a  presentation  of  the 
fundamentals  of  mathematical  linguistics  or  elements  of  game  theory. 
However,  even  without  this,  the  considerable  size  of  the  book  has 
forced  the  author  to  refrain  from  attempts  to  include  any  additional 
material.  At  the  same  time,  the  contents  of  the  book  do  encompass 
those  questions  which  at  the  present  time  as  usually  considered  the 
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basis  of  theoretical  cybernetics  (with  account  for  limiting  ourselves 
to  discrete  methods).  The  author  hopes,  therefore,  that  the  book  will 
be  of  assistance  in  mastering  the  mathematical  apparatus  of  cybernet¬ 
ics  and  preparing  for  work  in  the  theoretical  fields  to  individuals 
occupied  in  individual  applied  aspects  of  cybernetics  and  also  to  the 
individuals  interested  in  the  theoretical  problems  of  cybernetics. 

In  the  present  book  extensive  use  has  been  made  of  material  from 
courses  on  the  various  branches  of  cybernetics  and  mathematical  logic 
presented  b  y  the  author  at  Kiev  University  and  at  the  Kiev  House  of 
Scientific  and  Technical  Propaganda  in  1959-1962.  A  part  of  this  mate¬ 
rial  (theory  of  algoriths,  for  example)  has  been  published  previously 
for  service  use.  The  present  book  can  be  considered  to  be  the  first 
sufficiently  complete  textbook  for  students  of  the  branches  of  cyber¬ 
netics  mentioned  above. 


Chapter  1 

ABSTRACT  THEORY  OP  AUTOMATA 
§1.  ALPHABETIC  OPERATORS  AND  ALGORITHMS 

In  modern  mathematics  it  is  customary  to  call  the  structurally 
specified  correspondences  between  words  in  abstract  alphabets  algo¬ 
rithms. 

Any  finite  ensemble  of  objects,  termed  the  letters  of  a  given  al¬ 
phabet,  is  called  an  abstract  alphabet.  The  nature  of  these  objects  Is 
a  matter  of  complete  indifference  to  us.  For  example,  the  letters  of 
the  alphabet  of  any  language  (Russian,  Latin,  Greek,  etc.),  digits, 
any  symbols,  figures,  etc.,  can  be  considered  to  be  letters  of  ab¬ 
stract  alphabets.  If  we  wish  to,  we  can  introduce  an  abstract  alphabet 
whose  letters  will  be  considered  to  be  entire  words  of  any  particular 
language  (Russian,  for  example).  It  is  important  only  that  the  alpha¬ 
bet  considered  be  finite,  i.  e.  that  it  consist  of  a  finite  number  of 
letters. 

Introducing  the  concept  of  an  (abstract)  alphabet,  we  define  a 
word  in  this  alphabet  as  any  finite  ordered  sequence  of  letters.  For 
example,  in  the  alphabet  A  =  A(x,y)  consisting  of  the  two  letters  x 
and  %  we  consider  any  sequence  x,  y,  xx,  xy,  yx,  yy,  xxx,  ...  to  be 
words.  The  number  of  letters  in  a  work  is  termed  normally  the  length 
of  this  word,  so  that  the  words  we  just  listed  in  the  alphabet  have  re¬ 
spectively  the  lengths  1,  1,  2,  2,  2,  2, 

Along  with  words  of  positive  length  (consisting  of  no  less  than 
one  letter),  in  many  cases  it  is  convenient  to  consider  also  an  empty 
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word,  not  containing  even  one  letter. %In  the  present  chapter  use  is 
made  of  the  small  Latin  letter  e  to  designate  an  empty  word.  Sometimes, 
however,  it  is  convenient  to  designate  the  empty  word  in  complete  ac¬ 
cordance  with  its  definition,  not  writing  any  letter  in  the  place  cor¬ 
responding  to  this  word. 

We  note  that,  with  the  accepted  definition,  the  concept  of  a  word 
in  the  Russian  alphabet  will  differ  from  the  concept  of  a  word  as  ac¬ 
cepted  in  ordinary  language.  With  our  definition,  words  are  to  be  con¬ 
sidered  any  combination  of  letters,  including  meaningless  combinations: 
the  combinations  of  letters  "algorithm”,  "mathematics",  "'kit",  "dddd" 
must  to  an  equal  degree  be  considered  words  of  the  Russian  alphabet 
(considered  as  an  abstract  alphabet). 

With  expansion  of  an  alphabet,  i.e. ,  with  inclusion  in  its  compo¬ 
sition  of  new  letters,  the  concept  of  the  word  may  undergo  significant 
changes.  If,  for  example,  we  expand  the  Russian  alphabet  by  the  "let¬ 
ters"  ("  "  —  parentheses)  and  (,  —  comma),  then  the  four  words 

which  we  have  just  written  out  in  the  Russian  alphabet  can  be  consid¬ 
ered  as  a  single  word  in  the  alphabet  expanded  in  this  fashion.  By  com¬ 
plementing  the  Russian  alphabet  with  the  punctuation  marks  and  the 
separation  mark  (empty  space  left  between  two  neighboring  words),  we 
can  if  we  wish  consider  entire  phrases,  paragraphs  and  even  entire 
books  as  individual  words. 

In  just  the  same  way,  the  expression  69  +  72,  which  is  two  words 
(69  and  72)  in  the  alphabet  A  of  the  10  digits  (0, 1,2, 3, 4, 5, 6, 7, 8, 9) , 
joined  by  the  sum  sign,  can  be  considered  as  a  single  work  in  the  ex¬ 
panded  alphabet  A  which  is  obtained  as  the  result  of  joining  to  it  the 
new  letter  "+"  (sum  sign). 

Alphabetic  operator  or  alphabetic  representation  is  the  term  giv¬ 
en  to  any  correspondence  (function)  which  associates  words  in  a 


particular  alphabet  to  words  in  the  same  or  another  fixed  alphabet. 

The  first  alphabet  is  here  termed  the  input,  and  the  second  the  out¬ 
put  alphabet  of  the  given  operator.  In  the  case  of  coincidence  of  the 
input  and  output  alphabets,  we  say  that  the  alphabetic  operator  is  giv 
en  in  the  corresponding  alphabet. 

Hereafter  we  consider  primarily  single -valued  alphabetic  opera¬ 
tors,  associating  to  each  input  word  (word  in  the  input  alphabet  of 
the  operator)  no  more  than  one  output  word  (word  in  the  output  alpha¬ 
bet  of  the  operator).  If  the  alphabetic  operator  does  not  associated 
with  a  given  input  word  jd  any  output  word  (including  an  empty  word), 
then  we  say  that  it  is  not  defined  on  this  word.  The  ensemble  of  all 
words  on  which  an  alphabetic  operator  is  defined  is  termed  its  domain 
of  definition. 

On  the  basis  of  the  foregoing,  in  the  future  we  shall  always  un¬ 
derstand  (if  not  otherwise  specified)  by  the  term  "alphabetic  operator 
a  unique,  generally  speaking,  partially  defined  mapping  of  a  set  of 
words  in  the  input  alphabet  of  the  operator  into  a  set  of  words  in  Its 
output  alphabet. 

Thanks  to  the  possibility  of  specifying  the  alphabetic  operators 
on  less  than  all  the  words,  we  can,  without  loss  of  generality,  every 
time  consider  that  the  input  and  output  alphabets  of  the  operator  coin 
cide.  For  this  it  is  sufficient,  clearly,  to  combine  the  input  and  out 
put  alphabets  of  the  given  operator  qp  into  one  common  alphabet  A  and 
to  consider  the  operator  qp  as  an  operator  in  this  combined  alphabet, 

j 

specified  only  on  those  words  which  appeared  in  the  primitive  region 
of  definition  of  the  operator  qp. 

With  each  alphabetic  operator  there  is  associated  an  intuitive 
concept  on  its  complexity.  The  simplest  operators  are  those  which  per¬ 
form  letter-by-letter  mapping.  This  mapping  consists  in  each  letter  x 


of  the  input  word  jd  being  replaced  by  some  letter  jr  of  the  output  al¬ 
phabet  operator,  depending  only  on  the  letter  x  and  not  on  the  choice 
of  the  input  word  j).  Letter-wise  mapping  is  completely  defined  by  spec 
ifying  the  correspondence  between  the  letters  of  the  input  and  output 
alphabets. 

The  so-called  coding  transformations,  which  for  brevity  we  shall 

term  simply  codings,  are  of  great  importance  for  the  later  discussion. 

In  the  simplest  case  the  words  in  one  alphabet,  say  in  alphabet  A,  are 

coded  by  words  in  the  other  alphabet,  B,  as  follows:  to  each  letter  a1 

of  the  alphabet  A  there  is  associated  some  finite  sequence  b,  ,  b,  , 

11  12 

...b4  of  letters  in  the  alphabet  B,  called  the  code  of  the  correspond 
1k 

ing  letter,  such  that  to  the  different  letters  of  the  alphabet  A  there 
are  associated  different  codes. 

For  the  construction  of  the  desired  coding  transformation  it  is 
sufficient  now  to  replace  all  the  letters  of  any  word  j)  in  the  alpha¬ 
bet  A  by  the  codes  corresponding  to  them.  The  word  thus  obtained  in 
the  alphabet  B  we  tern  the  code  of  the  original  word  jd.  We  stipulate 
that  the  coding  transformation  must  necessarily  be  reversible.  In  oth¬ 
er  words,  different  words  in  alphabet  A  must  have  different  codes.  The 
condition  of  reversibility  of  the  coding  is  nothing  other  than  the 
condition  of  mutual  uniqueness  of  the  corresponding  coding  transforma¬ 
tion. 

It  Is  easy  to  see  that  reversibility  of  the  coding  is  not  ensured 
by  the  single  condition  that  the  codes  of  the  various  letters  (words 
of  length  l)  be  different.  Actually,  if  to  the  letter  a-^  there  is  as¬ 
sociated  the  code  bb,  and  to  the  letter  a .g  the  code  b,  then  the  code 
bbb  will  clearly  correspond  both  to  the  word  a^ag  and  to  the  words 


It  is  not  difficult  to  verify  that  the  coding  will  be  reversible 
whenever  the  following  two  conditions  are  fulfilled: 

a)  the  codes  of  the  different  letters  of  the  original  alphabet  A 
are  different] 

b)  the  code  of  any  letter  of  the  alphabet  A  cannot  coincide  with 
any  of  the  initial  segments  of  the  codes  of  the  other  letters  of  this 
alphabet. * 

Actually,  let  us  assume  that  both  of  these  conditions  are  satis¬ 
fied  and  let  the  word  q  =  b.  b.  .  ..b.  be  the  code  of  some  word  p  = 

*1  12  xn 

a.  a.  ...a,  in  the  alphabet  A.  Let  us  show  that  from  the  code  q  we 
J1  J2  Jm 

can  uniquely  recover  the  word  jd.  In  view  of  condition  b)  only  one  ini¬ 
tial  segment  of  the  word  C[  can  coincide  with  the  code  of  any  letter  of 
the  alphabet  A.  It  is  clear  that  the  code  of  the  letter  a.  is  such  a 
segment.  Discarding  this  segment,  we  obtain  the  code  of  the  word 

P-,  =  a,  ...a,  .  Applying  to  it  the  same  reasoning,  we  restore  uniquely 
1  J2  Jm 

the  following  letter  (a.  )  of  the  word  J3,  and  so  on.  Using  this  tech- 

J  2 

nique,  all  the  letters  of  the  word  jg  are  restored  one  after  the  other. 
Consequently,  to  any  given  code  there  can  correspond  only  one  word  in 
the  alphabet  A,  which  proves  the  reversibility  (mutual  uniqueness)  of 
the  coding  transformation. 

Condition  b)  is  satisfied  if  the  codes  of  all  the  letters  of  the 
original  alphabet  have  Identical  length.  By  convention  we  call  the  cod¬ 
ing  in  this  case  normal.  Use  of  coding  permits  reducing  the  study  of 
arbitrary  alphabetic  transformations  to  alphabetic  transformations  in 
some  once -and -for -all  selected  standard  alphabet.  Most  frequently,  as 
such  a  standard  alphabet  there  is  chosen  the  so-called  binary  alphabet, 
consisting  of  two  letters  which  are  usually  identified  with  the  digits 
0  and  1. 
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Let  A  be  an  arbitrary  alphabet  and  B  be  a  standard  alphabet  (bi¬ 
nary,  for  example)  consisting  of  more  than  one  letter.  If  n  is  the  num¬ 
ber  of  letters  in  alphabet  A  and  m  is  the  number  of  letters  in 
alphabet  B,  then  we  can  always  select  the  number  k  so  as  to  satisfy 
the  inequality 

m*  >  n.  ( 1 ) 

Since  the  number  of  different  words  of  length  k  in  the  m-letter 

alphabet  is  clearly  equal  to  m^,  then  inequality  (l)  shows  that  we  can 

\ 

code  all  the  letters  in  alphabet  A  with  words  of  length  k  in  alphabet 
B  so  that  the  codes  of  the  different  letters  are  different.  Any  such 
coding  will  be  normal  and  will  generate,  in  light  of  what  was  said  a- 
bove,  a  reversible  coding  trans format! on  of  the  words  in  alphabet  A 
into  words  in  alphabet  B.  We  designate  this  transformation  by  a  and 
use  a-1  to  designate  the  reverse  transformation  which  transforms  each 
word  £  in  the  alphabet  B,  which  is  the  code  of  some  word  jo  in  alphabet 
A,  into  the  word  jd. 

Now  if  cp  is  an  arbitrary  alphabetic  operator  in  alphabet  A,  then 
the  transformation  =  a”1cpa  obtained  as  the  result  of  sequential  per¬ 
formance  of  the  transformations  a-1,  cp  and  a  will  be,  obviously,  some 
alphabetic  operator  in  the  standard  alphabet  B.  We  term  this  operator 
the  alphabetic  operator  in  the  alphabet  B,  conjugate  (with  the  aid  of 


the  a  coding)  with  the  alphabetic  operator  cp. 

The  operator  cp  is  uniquely  recovered  from  the  conjugate  operator 
and  the  corresponding  coding  transformation  a 

9  —  (2) 

With  the  aid  of  this  equation,  and  also  its  dual  equation  which 
was  written  previously 


9  =  cT'cpa 


(3) 
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the  arbitrary  alphabetic  operators  are  reduced  to  alphabetic  operators 
in  the  standard  alphabet.  This  reduction,  of  course,  can  be  performed 
by  an  infinite  number  of  different  methods,  since  there  exist  infinite¬ 
ly  many  different  codings  of  words  in  any  given  alphabet  by  words  in 
the  standard  alphabet. 

The  described  reduction  can  also  be  accomplished  in  the  case  of 
alphabetic  operators  for  which  the  input  and  output  alphabets  are  dif¬ 
ferent.  For  example,  let  9  be  an  arbitrary  alphabetic  operator  with 
the  input  alphabet  A  and  the  output  alphabet  C,  let  B  be  the  standard 
alphabet,  let  a  be  any  (reversible)  coding  of  words  in  the  alphabet  A 
by  words  in  the  standard  alphabet,  and  let  y  be  an  analogous  coding  of 
the  words  in  alphabet  C. 

Now  it  is  easy  to  see  that  the  transformation  V'  =  a”1<p^  is  an  al¬ 
phabetic  operator  in  the  standard  alphabet  B  by  which  under  the  condi¬ 
tion  of  knowing  the  coding  transformations  a  and  y  the  original  trans¬ 
formation  <p  is  uniquely  restored. 

The  concept  of  the  alphabetic  operator  is  extremely  general.  Actu¬ 
ally  any  processes  of  information  conversion  reduce  to  it  or  can  be  in 
some  sense  reduced  to  it.  Here  and  in  the  future,  by  information  we 
shall  understand  not  only  Intelligent  communications  but  in  general 
any  information  on  processes  and  states  of  any  nature  which  can  be  de¬ 
tected  by  the  sense  organs  of  man  or  by  instruments. 

For  certain  specialized  forms  of  information,  for  example  infor¬ 
mation  which  is  lexical  or  numerical,  the  alphabetic  method  of  specifi¬ 
cation  is  the  most  natural  and  is  constantly  used.  The  transformations 
of  these  forms  of  information  are  reduced  to  the  alphabetic  operators 
in  the  most  indirect  fashion:  both  the  input  and  the  output  informa¬ 
tion  in  any  information  converter  in  this  case  can  be  represented  in 
the  form  of  words,  and  the  conversion  of  the  information  reduces  to 
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the  establishment  of  some  correspondence  between  the  words.  We  recall 
that  with  rational  expansion  of  the  alphabet  with  words,  account  can 
be  taken  in  the  lexical  information  not  only  of  ordinary  words,  but 
also  entire  sentences  and  even  any  sequences  of  sentences. 

One  of  the  characteristic  tasks  of  the  conversion  of  lexical  in¬ 
formation  is  the  translation  of  texts  from  one  language  to  another.  It 
is  well  known  that  the  translation  problem  does  not  reduce  to  the  prob¬ 
lem  of  establishing  the  correspondence  between  the  words  of  the  lan¬ 
guages  which  are  involved  in  the  translation.  If,  however,  we  consider 
as  words  the  entire  books  or  at  least  individual  sections  of  the  book, 
then  the  problem  of  translation  completely  reduces  to  the  problem  of 
establishing  correspondence  between  such  generalized  words.  Thus,  the 
problem  of  translation  from  one  language  to  another  can  be  treated  as 
the  process  of  the  realization  of  some  alphabetic  operator. 

It  is  worthy  of  note,  moreover,  that  quite  high-quality  and  gram¬ 
matical  translation  permits,  as  is  known,  the  possibility  of  known  mod¬ 
ifications  of  the  translated  text.  Therefore  the  process  of  transla¬ 
tion  is  described,  not  by  the  usual  single -valued  alphabetic  operator, 
but  by  a  multi-valued,  or  so-called  probabilistic,  alphabetic  operator. 
Such  an  operator  associates  with  each  input  word  from  the  region  of 
its  definition  not  a  single  output  word,  but  a  whole  ensemble  of  out¬ 
put  words.  In  the  specific  application  of  this  operator  to  a  particu¬ 
lar  input  word  £  there  is  a  random  selection  of  the  output  word  from 
the  ensemble  of  output  words  corresponding  to  the  word  jd. 

In  addition  to  the  alphabetic  operators  for  the  translation  from 
one  language  to  another,  we  can  construct  alphabetic  operators  which 
resolve  other  problems  of  the  conversion  of  lexical  information,  for 
example  the  problem  of  editing  texts  in  a  particular  language,  the 
problem  of  composing  abstracts  of  articles,  etc.  It  is  n^t  difficult 


to  expand  the  field  of  application  of  the  alphabetic  operators,  using 
the  alphabetic  representation  not  only  for  lexical  information  but  al¬ 
so  for  other  forms  of  Information.  For  example,  using  the  known  tech¬ 
niques  of  chess  notation,  we  can  write  chess  positions  in  the  form  of 
words  consisting  of  the  letters  of  the  Russian  and  Latin  alphabets, 
digits,  and  punctuation  marks  (comma).  In  this  case  the  process  of  the 
chess  game  can  be  interpreted  as  the  process  of  establishing  the  cor¬ 
respondence  between  any  given  position  and  the  position  resulting  from 
it  after  performing  the  next  move.  Thus,  again  in  this  case  we  are 
dealing  with  an  alphabetic  operator  (probabilistic,  generally  speaking). 

Similarly,  it  is  not  difficult  to  represent  in  the  form  of  proces¬ 
ses  or  realization  of  the  alphabetic  operators  many  other  processes  of 
information  conversion,  for  example  the  orchestration  of  melodies,  the 
solution  of  mathematical  problems,  the  problem  of  production  planning, 
etc. 

It  may  seem  at  first  that  for  the  characterization  of  the  conver¬ 
sion  of  continuous  information  (for  example,  visual  or  random  auditory 
sensations)  the  concept  of  the  alphabetic  operator  is  insufficient. 
However  this  is  not  so,  or  more  precisely,  not  entirely  so. 

The  reception  and  conversion  of  continuous  Information  is  always 
accomplished  with  the  aid  of  nonideal  Instruments  which  do  not  react 
to  extremely  small  variations  of  the  characteristics  of  the  informa¬ 
tion  being  converted.  In  real  instruments,  detecting  and  converting 
continuous  information,  there  always  exist  several  limitations  which 
make  it  possible  to  consider  this  Information  as  alphabetic  informa¬ 
tion.  For  greater  clarity,  let  us  consider  visual  information  (the 
same  phenomena  occur  with  the  other  forms  of  specifying  continuous  in¬ 
formation). 


The  first  limitation  is  that  of  the  resolving  power  of  the  instru 
ment  which  receives  the  information.  This  limitation  leads  to  the  situ 
ation  where  sufficiently  closely  spaced  points  of  the  portion  of  space 
on  which  the  information  in  question  is  distributed  (for  example,  a 
picture  or  drawing)  is  sensed  by  the  instrument  (say,  the  human  eye) 
as  a  single  point.  This  implies  the  possibility  of  considering  this  in 
formation  as  information  given,  not  at  an  infinite  number  of  points, 
but  only  at  a  finite  number  of  points. 

The  second  limitation  is  associated  with  the  limited  sensitivity 
of  the  instrument  receiving  the  information.  This  limitation  leads  to 
the  instrument  being  able  to  distinguish  only  a  finite  number  of  lev¬ 
els  of  the  quantity  carrying  the  information  (for  example,  the  bright¬ 
ness  of  individual  points  of  a  drawing). 

On  the  basis  of  the  described  limitations  we  come  to  the  conclu¬ 
sion  that  the  instrument,  as  a  result  of  its  nonideal  nature,  can  at 
each  given  instant  sense  only  one  pattern  of  a  finite  (and  not  infin¬ 
ite  as  it  might  seem  without  account  for  the  limitations  indicated) 
number  of  different  patterns  of  the  instantaneous  spatial  distribution 
of  the  information  in  question. 

Introducing  for  each  such  pattern  a  special  literal  notation,  we 
come  to  the  finite  alphabet  A  which  with  account  for  the  indicated  lim 
itations  is  completely  adequate  for  the  characterization  of  the  infor¬ 
mation  arriving  at  the  input  of  the  instrument  (nonideal)  which  we  are 
considering  at  every  given  instant  of  time.  If  we  denote  by  the  letter 
n  the  number  of  spatial  points  sensed  by  the  instrument  as  Individual 
points,  and  by  the  letter  m  the  number  of  levels  of  the  physical  quan¬ 
tity  carrying  the  information  which  are  distinguished  by  the  instru¬ 
ment,  then  the  number  of  letters  in  the  alphabet  A  will  be  equal,  it 
is  easy  to  see,  to  mn  (for  simplicity  we  assume  the  number  of  levels 


which  are  distinguishable  by  the  instrument  to  be  identical  for  all 
points  of  the  space). 

Of  course,  the  number  of  letters  in  the  alphabet  A  which  we  have 
just  estimated  may  be  found  to  be  excessively  large  (in  the  case  of 
the  reception  of  visual  Information  by  the  human  eye  it  may  be  esti¬ 
mated  as  a  one  with  several  thousand  zeros  following  it).  Nevertheless 
it  is  still  finite,  and  from  the  abstract  theoretical  point  of  view 
the  essential  thing  is  only  whether  the  alphabet  A  is  finite  or  infin¬ 
ite. 

Continuing  our  investigation,  we  note  that  every  real  Instrument 
which  receives  and  converts  information  has,  along  with  the  two  limita 
tions  indicated,  a  third  limitation.  Here  we  are  dealing  with  the  lim¬ 
ited  passband  of  the  instrument,  which  does  not  permit  it  to  differen¬ 
tiate  excessively  rapid  changes  of  the  received  quantities.  In  view  of 
the  familiar  Kotel'nikov  principle  (Ref  46),  the  limitation  of  the 
pass  band  is  equivalent  to  the  introduction  during  the  information 
transmission  in  place  of  the  usual  continuous  time  a  conditional  dis¬ 
crete  time,  neighboring  instants  of  which  differ  from  one  another  by 
quite  definite  (although  usually  very  small)  segments  of  time.  Roughly 
speaking,  as  such  an  elementary  segment  of  time  we  select  the  maximal 
segment  in  the  course  of  which  the  instrument  In  question  is  incapable 
of  differentiating  the  variations  of  the  quantity  carrying  the  informa 
tion. 

After  the  introduction  of  this  des crete  time,  the  Information  re¬ 
ceived  by  our  instrument  after  any  finite  segment  of  time  t  naturally 
is  represented  in  the  form  of  a  word  in  the  previously  introduced  al¬ 
phabet  A,  The  number  of  letters  in  this  word  is  equal  to  the  number  of 
Instants  Tp...,Tk  of  the  discrete  time  located  in  the  given  time  seg¬ 
ment  t,  and  its  1-th  letter  (i  =  l,2,...,k)  is  the  information 


received  by  the  instrument  at  the  instant  of  time  expressed  in  the 
form  of  a  letter  of  the  alphabet  A. 

Since  analogous  considerations  are  applicable  not  only  to  the  in¬ 
put  information  but  also  to  the  output  information,  any  real  informa¬ 
tion  converter  must  be  considered  (with  account  for  the  limitations 
indicated  above)  as  an  instrument  realizing  some  alphabetic  operator. 
The  alphabetic  operator  realized  by  the  instrument  completely  (with 
an  accuracy  to  the  information  coding)  determines  the  informational  es¬ 
sence  of  this  instrument,  in  other  words  the  information  conversion 
performed  by  this  instrument. 

Thus,  we  have  established  the  extremely  great  generality  of  the 
concept  of  the  alphabetic  operator.  Actually  the  theory  of  any  informa¬ 
tion  converter  was  found  to  reduce  to  the  study  of  the  alphabetic  oper¬ 
ators.  And  man  encounters  information  converters  literally  at  every 
step  of  his  practical  existence.  The  various  Instruments  and  devices 
for  automatic  control  ane  Information  converters.  Finally,  one  of  the 
most  important  and  essential  aspects  of  the  study  of  the  activity  of 
man  himself  is  the  aspect  associated  with  consideration  of  man  as  a 
very  complex  and  highly -perfected  Information  converter.  All  this 
makes  it  possible  to  consider  the  theory  of  the  alphabetic  operators 
one  of  the  most  important  component  parts  of  cybernetics. 

The  basis  of  the  theory  of  the  alphabetic  operators  are  the  meth¬ 
ods  of  representing  them.  In  the  case  when  the  region  of  definition  of 
definition  of  the  alphabetic  operator  is  finite  the  question  of  its 
representation,  at  least  in  the  theoretical  sense,  is  resolved  very 
simply:  the  operator  can  be  represented  by  a  simple  correspondence  ta¬ 
ble.  In  the  left  side  of  such  a  table  we  write  out  all  the  words  ap¬ 
pearing  in  the  region  of  definition  of  the  operator  in  question,  and 
in  the  right  side  we  write  the  output  words  obtained  as  the  result  of 


the  application  of  the  operator  to  each  word  from  the  left  side  of  the 
table. 

Of  course,  if  the  region  of  the  definition  of  the  alphabetic  oper¬ 
ator  is  sufficiently  large,  this  method  of  representation  can  become 
excessively  cumbersome  and  therefore  not  applicable  in  practice.  How¬ 
ever,  for  the  moment  we  shall  not  take  such  considerations  into  acount, 
limiting  ourselves  to  the  establishment  only  of  the  theoretical  possi¬ 
bility  of  representing  particular  alphabetic  operators. 

In  the  case  of  sin  infinite  region  of  definition  of  the  alphabetic 
operator,  its  representation  with  the  aid  of  a  simple  correspondence 
table  becomes  impossible  in  principle,  since  man  does  not  have  at  his 
disposal  the  means  to  permit  him  to  actually  write  out  or  perceive  an 
infinite  set  of  words.  However,  it  is  well  known  that  man  long  ago 
learned  to  represent  operators  on  infinite  sets  of  words  without  writ¬ 
ing  out  the  entire  correspondence  tables.  For  this  purpose  it  is  suf¬ 
ficient  to  consider,  for  example,  the  alphabetic  operator  represented 
by  the  formula 

xx. .  .x-*yy. .  .y  (n=*l,2t. . 
n  times  n+1  times 

This  formula  defines  the  correspondence  on  an  Infinite  set  of 
words,  achieved  without  actually  writing  out  the  entire  correspondence 
table  (which,  of  course,  in  this  case  cannot  be  done).  In  place  of  the 
correspondence  table  itself,  this  formula  gives  a  rule  with  the  aid  of 
which,  after  a  finite  number  of  steps,  there  can  be  established  the 
output  word  corresponding  to  any  prescribed  input  word  from  the  re- 
geion  of  definition  of  the  alphabetic  operator  being  considered. 

An  analogous  situation  arises  every  time  we  need  to  represent  an 
alphabetic  operator  with  an  infinite  region  of  definition;  in  place  of 
the  correspondence  table  itself  there  is  given  a  finite  number  of 
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rules  permitting  after  a  finite  number  of  steps  the  finding  of  the  pre¬ 
scribed  line  of  this  table  (the  value  of  the  alphabetic  operator  on 
any  input  word  appearing  in  the  region  of  its  definition). 

Alphabetic  operators  represented  with  the  aid  of  finite  systems 
of  rules  are  customarily  termed  algorithms. 

On  the  basis  of  the  discussion  above,  we  can  easily  understand 
that  every  alphabetic  operator  which  can  actually  be  represented  is  of 
necessity  an  algorithm.  In  particular,  all  alphabetic  operators  with 
finite  regions  of  definition  represented  by  (finite)  correspondence  ta¬ 
bles  will  be  algorithms.  Formula  (4)  also  represents  an  algorithm. 

It  is  not  difficult  to  construct  other  examples  of  algorithms. 
Associating  with  each  whole  positive  number  its  square,  we  obtain  an 
alphabetic  op^  ator  in  the  alphabet  consisting  of  all  the  digits  of 
the  number  system  used  for  the  representation  of  these  numbers.  Since 
the  rules  for  squaring  make  it  possible  after  a  finite  number  of  steps 
to  obtain  the  square  of  any  prescribed  whole  number,  this  operator  can 
be  considered  as  an  algorithm. 

All  the  specific  alphabetic  operators  considered  in  the  present 
chapter  (including  the  operators  for  translation  from  one  language  to 
another,  chess  moves,  etc.  )  also  can  be  represented  with  the  aid  of 
finite  systems  of  rules  and  can,  consequently,  be  considered  as  algo¬ 
rithms. 

We  must  emphasize  one  distinction  existing  between  the  concepts 
of  the  alphabetic  operator  and  the  algorithm.  In  the  concept  of  the  al¬ 
phabetic  operator  only  the  correspondence  itself,  established  by  the 
operator  between  the  input  and  output  words,  is  of  essence,  and  not 
the  method  by  which  this  correspondence  is  established.  In  the  concept 
of  the  algorithm,  on  the  other  hand,  the  primary  emphasis  is  placed  on 
the  method  of  representation  of  the  correspondence  established  by  the 


algorithm.  Thus,  the  algorithm  is  nothing  other  than  an  alphabetic  op¬ 
erator  together  with  the  rules  defining  its  operation. 

The  concept  of  equality  for  the  alphabetic  operators  and  algo¬ 
rithms  is  defined  in  accordance  with  the  foregoing.  Two  alphabetic  op¬ 
erators  are  considered  equivalent  if  they  have  the  same  region  of  def¬ 
inition  and  associate  with  any  prescribed  input  word  from  this  region 
identical  output  words.  The  concept  of  equality  for  algorithms  in¬ 
cludes  the  conditions  of  equality  for  the  corresponding  operators,  but 
also  provides  for  coincidence  of  the  systems  of  rules  which  represent 
the  operation  of  these  algorithms  on  the  input  words.  The  algorithms 
for  which  there  coincide  only  the  alphabetic  transformations  (opera¬ 
tors)  defined  by  them,  but,  generally  speaking,  not  the  methods  of  rep¬ 
resentation,  we  shall  term  equivalent  algorithms. 

Usually  in  the  abstract  theory  of  algorithms  we  consider  only 
those  algorithms  to  which  there  correspond  single -valued  alphabetic 
operators.  Every  algorithm  A  of  this  kind  differs  in  that  to  any  input 
word  jd  from  the  domain  of  its  definition  it  associates  a  completely  de¬ 
fined  output  word  q  «=  A(p)  regardless  of  the  conditions  in  which  the 
algorithm  A  operates.  Such  algorithms  and  their  corresponding  alphabet¬ 
ic  operators  will  be  called  determinate. 

In  many  cases  it  is  advisable  to  expand  the  concept  of  the  algo¬ 
rithm,  introducing  into  the  system  of  rules  which  describe  the  algo¬ 
rithms  the  possibility  of  the  random  selection  of  particular  words  or 
particular  rules.  Here  the  probability  of  a  particular  selection  must 
be  either  fixed  in  advance  or  determined  in  the  process  of  realization 
of  the  algorithm.  Such  algorithms  will  be  called  random  and  will  lead 
to  the  multi-valued  alphabetic  operators.  More  precisely,  for  any  in¬ 
put  word  £  appearing  in  the  domain  of  definition  of  the  random  algo¬ 
rithm  A,  this  algorithm  uniquely  defines  the  probability  ap(q)  of  the 


appearance  of  the  different  output  words  jg  as  the  response  to  the  in¬ 
put  word  jd.  The  probabilities  a  (q)  in  the  case  of  the  usual  random  al- 
gorithm  must  not  vary  in  the  process  of  its  functioning,  although  the 
algorithm  itself  can,  of  course,  give  different  responses  with  repeat¬ 
ed  application  to  the  same  input  word  jd. 

We  need  to  consider  also  the  so-called  self-variable  algorithms, 
i.  e.  ,  those  algorithms  which  not  only  transform  the  input  words  ap¬ 
plied  to  them  but  also  themselves  change  in  the  process  of  this  trans¬ 
formation.  The  result  of  the  action  of  the  self -variable  algorithm  A 
on  a  particular  input  word  jd  depends  not  only  on  this  word  but  also  on 
the  history  of  the  preceding  operation  of  the  algorithm,  i. e. ,  on  the 
(finite)  sequence  of  input  words  processed  by  the  algorithm  A  prior  to 
the  arrival  at  its  input  of  the  word  £  in  question. 

The  generalization  of  the  concept  of  the  algorithm  by  means  of 
the  introduction  of  the  possibility  of  self -variation  is  applicable  to 
both  the  determinate  and  the  random  automata.  In  the  latter  case,  de¬ 
pending  on  the  history  of  the  pre /ious  operation  of  the  algorithm, 
there  are  changes  of  the  probabilities  a  (q)  of  the  different  output 
words  c[  associated  by  the  algorithm  A  to  any  given  input  word  jd.  This 
dependence  can,  moreover,  also  be  expressed  by  a  random  function  rath¬ 
er  than  a  determinate  one. 

The  self -variable  algorithms  are  conveniently  represented  in  the 
form  of  a  system  of  two  algorithms,  the  first  of  which,  the  so-called 
operational  algorithm,  performs  the  processing  of  the  input  words,  and 
the  second,  termed  the  monitoring  or  controlling  algorithm,  introduces 
specific  changes  into  the  first,  operational,  algorithm.  In  Chapter 
it  is  shown  that  the  property  of  self -variability  of  the  algorithm  is 
determined  not  so  much  by  the  structure  of  the  device  which  realizes 
the  corresponding  algorithm,  as  by  the  method  of  fractionation  of  the 


input  information  into  individual  words,  which,  as  noted  above,  in  the 
case  of  the  abstract  alphabets  is  to  a  considerable  degree  arbitrary. 
Thus,  depending  on  the  choice  of  this  method  the  same  device  may  in 
some  cases  realize  a  self -variable  algorithm,  in  other  cases  it  will 
realize  a  non-self -variable  algorithm. 

Throughout  the  first  three  chapters  we  shall  consider  only  the 
conventional  (determinate,  non -self -variable)  algorithms  without  mak¬ 
ing  this  stipulation  in  every  instance.  In  the  later  chapters  use  will 
be  made  also  of  the  generalized  concepts  of  the  algorithms  introduced 
above. 

§2.  NORMAL  ALGORITHMS 

In  this  and  the  several  following  sections  we  shall  study  certain 
general  methods  of  representation  of  the  algorithms  which  are  charac¬ 
terized  by  the  property  of  universality,  i. e. ,  those  methods  which 
make  it  possible  to  obtain  an  algorithm  which  is  equivalent  to  any  pre¬ 
scribed  algorithm.  In  this  chapter  rarious  universal  methods  or  repre¬ 
senting  algorithms  are  discussed,  not  in  the  historical  sequency  in 
which  they  were  developed,  but  in  an  order  which  is  most  convenient 
from  the  point  of  view  of  the  present  volume.  We  begin  our  exposition 
with  the  so-called  normal  algorithms  suggested  and  studied  by  Markov 
(Ref  53). 

Every  general  method  of  representation  of  algorithms  is  termed  an 
algorithmic  system.  The  algorithmic  system  usually  includes  objects  of 
a  dual  nature  which,  following  Kaluzhnin  (Ref  37),  we  shall  term  opera¬ 
tors  (or,  more  precisely,  elementary  operators)  and  identifiers  (more 
percisely,  elementary  identifiers).  Elementary  operators  are  quite  sim¬ 
ple  (simply  represented)  alphabetic  operators  whose  sequential  perform¬ 
ance  realizes  any  algorithms  in  the  algorithmic  system  in  question. 

The  identifiers  serve  for  the  recognition  of  particular  properties  of 
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the  Information  processed  by  the  algorithm  and  for  the  variation,  de¬ 
pending  on  the  results  of  the  identification,  of  the  sequence  in  which 
the  elementary  operations  follow  one  another. 

For  Indicating  the  set  of  elementary  operators  and  the  order  of 
their  sequencing  one  after  the  other  in  the  representation  of  any  spe¬ 
cific  algorithm,  it  is  convenient  to  make  use  of  the  directed  graphs 
of  a  special  kind  which,  following  Kaluhnnin  (Ref  37) ,  we  shall  term 
the  graph-diagrams  of  the  corresponding  algorithms. 

The  graph -diagram  of  an  algorithm  is  a  finite  set  of  circles  (or 
other  geometrical  figures),  termed  the  elements  of  the  graph -diagram, 
which  are  interconnected  by  arrows.  To  each  element,  other  than  fne 
two  special  elements  which  are  termed  the  input  and  output ,  there  is 
associated  some  elementary  operator  or  identifier.  From  each  element 
representing  an  operator,  and  also  from  the  input  element,  there  e- 
merges  precisely  one  arrow;  from  each  element  representing  an  identi¬ 
fier  there  emerge  precisely  two  arrows;  no  arrow  emerges  from  the  out¬ 
put  element.  Any  number  of  arrows  can  enter  an  element. 

The  algorithm  defined  by  any  given  graph -diagram  operates  as  fol¬ 
lows.  The  input  word  enters  first  the  input  element  and  travels  in  the 
directions  Indicated  by  the  arrows,  being  transformed  on  passage 
through  the  operator  elements  by  the  operators  associated  with  these 
elements.  When  the  word  enters  an  identifying  element  a  check  is  made 
of  the  condition  associated  with  this  element  (application  of  condi¬ 
tional  identifier).  If  the  condition  Is  satisfied,  the  word  emerges 
from  the  element  along  one  of  the  arrows  (usually  Indicated  by  the  sym¬ 
bol  "+"),  and  if  the  condition  is  not  satisfied  it  emerges  along  the 
other  arrow  (indicated  by  the  symbol  "-"). 

The  word  is  not  altered  in  the  identifying  elements.  If  the  input 
word  jd  applied  to  the  input  element  of  the  graph -diagram,  after  passing 
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through  the  elements  of  the  diagram  and  being  tranr  formed,  arrives  aft¬ 
er  a  finite  number  of  steps  at  the  output  element,  it  is  considered 
that  the  algorithm  is  applicable  to  the  word  (the  wcrd  jc  is  ln  the 
domain  of  definition  of  this  algorithm),  and  the  result  of  the  action 
of  the  algorithm  on  the  word  2  will  be  that  word  which  is  in  the  out¬ 
put  element  of  the  diagram.  If  after  the  application  of  the  word  2  to 
the  input  element  of  the  graph-diagram  its  transformation  and  movement 
along  the  graph-diagram  lasts  infinitely  long,  without  arrival  at  the 
output  element,  then  it  is  considered  that  the  algorithm  is  not  appli¬ 
cable  to  the  word  jg,  in  other  words,  the  word  2  Is  not  in  domain 
of  definition  of  the  algorithm. 

In  normal  algorithms  use  is  made  only  of  one  type  of  elementary 
operator,  termed  substitution  operators,  and  one  type  of  elementary  i- 
dentifier,  termed  occurrence  identifier.  We  shall  describe  these  iden¬ 
tifiers  and  operators  in  more  detail.  To  do  this  we  shall  first  ac¬ 
quaint  ourselves  with  the  concept  of  occurrence  of  one  word  in  another. 

Let  2  2111(1  £  be  two  arbitrary  words  in  a  particular  alphabet.  We 
say  that  the  word  occurs  in  the  word  £  if  the  word  2  can  be  repre¬ 
sented  in  the  form  p  «=  p^Pg,  where  p1  and  p2  and  some  words,  possibly 
even  empty  ones.  The  occurrence  found  for  the  word  in  the  word  £  is 
termed  first  left  (or  simply  first)  occurrence  if  in  the  considered 
representation  of  the  word  2  ln  the  form  p  =  PjqPg  the  word  p^  has  the 
shortest  possible  length  among  all  similar  representations  of  the  word 

2- 

The  occurrence  identifier  is  given  by  the  indication  of  some 
fixed  word  >g,  and  the  sense  of  its  application  is  that  for  any  given 
word  ^  a  check  is  made  of  the  condition  of  whether  or  not  the  word  3 
occurs  in  the  word  2 •  The  substitution  operator  is  usually  given  in 
the  form  of  two  words  connected  by  an  arrow,  q  The  operation  of 
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the  operator  amounts  to  performance  of  the  substitution  of  the  word 
in  place  of  the  first  left  occurrence  of  the  word  in  any  given  v/ord 
jo.  If  we  separate  explicitly  the  first  occurrence  of  the  word  in  the 
word  jo,  writing  the  word  jo  in  the  form  p^q^Po,  after  the  application  of 
the  considered  operator  it  is  transformed  into  the  word 

In  the  application  of  the  occurrence  identifier  we  agree  to  sepa¬ 
rate  the  found  (first  left)  occurrence  of  the  identified  word  in  the 
given  word  by  the  use  of  parentheses.  For  example,  applying  to  the 
word  p  ■  xxyxyxx  the  occurrence  identifier  of  the  word  q  «=  xy,  we  sepa¬ 
rate  the  first  occurrence  of  the  word  jg  in  the  word  jd  as  follows:  p  = 
x(xy)xyxx. 

The  algorithms  which  are  represented  by  graph -diagrams  consisting 
exclusively  of  word  occurrence  identifiers  and  substitution  operators 
are  termed  generalized  normal  algorithms.  Here  it  is  assumed  that  to 
each  substutution  operator  of  the  form  q.^  -*•  q2  there  is  connjcted  only 
a  single  arrow:  an  arrow  with  a  "+u  sign  emerging  from  the  q-j^  identi¬ 
fier. 

An  example  of  a  graph-diagram  of  a  generalized  normal  algorithm 
is  shown  in  Fig.l.  On  this  figure  the  identifiers  are  shown  in  the 
form  of  rectangles.  The  operator  xy  -*■  denotes  substitution  of  an  empty 
word  in  place  of  the  first  occurrence  of  the  word  xy.  In  accordance 
with  the  notation  of  the  empty  word  which  was  used  in  the  preceding 
section,  this  operator  can  be  written  also  in  the  form  xy  e. 

Considering  the  operation  of  the  algorithm  A  given  by  the  graph- 
diagram  of  Fig.  3 ,  we  note  that  the  first  operator  from  the  top  per¬ 
forms  the  transposition  of  x  to  the  left  and  of  ^  to  the  right  portion 
of  the  word  until  the  word  takes  the  form  xx.  ..xyy.  ..y  (all  x  precede 
all  l)-  Only  after  reduction  of  the  word  to  this  form  does  the  second 

operator  come  into  action,  annihilating  the  pairs  xy  until  only  x  or  v 
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remain  in  the  word.  If  in  the  originally  given  word  jo  there  were  m  x’s 
and  n  y’s,  then  as  a  result  of  the  operation  of  the  algorithm  A  it  is 
transformed  into  the  word  q  =  A(p),  having  the  length  |m-n|  and  con¬ 
sisting  of  only  x* s  (if  m  >  n)  or  only  y's  (if  n  >  m). 

Having  considered  the  generalized  normal  algorithms,  let  us  turn 
to  the  characteristic  of  the  normal  algorithms  themselves.  Those  gen¬ 
eralized  normal  algorithms  whose  graph -diagrams  have  some  special  form 
are  termed  normal  algorithms.  In  order  to  describe  this  form  we  note 
that  as  a  result  of  the  definition  of  the  generalized  normal  algo¬ 
rithms  presented  above,  every  operator  q^  q2  occurs  paired  with  the 
identifier  q^  in  the  graph -diagram  of  such  algorithms. 

Let  us  combine  in  the  graph -diagram  each  such  pair  of  elements  in¬ 
to  a  single  element,  retaining  for  it  the  notation  of  the  correspond¬ 
ing  operator. 


,  -i  N  • 

C  BwroQ  b 

Pig.  1.  a)  Input; 
b)  output. 


C  fl*o0Q  a 

c  )  b 

Pig.  2.  a)  Input; 
b)  output. 


From  each  combined  element  there  will  emerge  two  arrows:  an  arrow  with 
the  symbol  "+"  along  which  there  is  directed  the  word  subjected  to  the 
action  of  the  operator  of  the  given  element,  and  an  arrow  with  the  sym¬ 
bol  "  along  which  the  word  is  directed  if  the  element  operator  is  not 
applied  to  it.  Nonapplicability  of  the  substitution  operator  to  a  word 
denotes  the  absence  of  the  occurrence  of  the  left  portion  of  t ho  opera¬ 
tor  (the  word  in  the  operator  q1  -*■  q2)  in  the  given  word. 
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Using  the  described  technique  for  combining  elements,  the  graph- 
diagram  of  the  algorithm  shown  in  1  can  be  represented  in  the  dia¬ 

gram  shown  in  Pig.  2.  Such  a  graph with  combined  elements  in 
the  case  of  the  normal  algorithms  must  e  tisfy  the  following  condi¬ 
tions  : 

a)  all  the  combined  (operator-identifier)  elements  of  the  graph- 

diagram  are  ordered  by  means  of  assigning  them  the  sequential  numbers 
from  1  to  n,  and  a  negative  output  (arrow  with  symbol  )  of  the  i-th 
element  is  connected  to  the  (i  +  l)  -th  element  (i=l,  —  i)and  a 

negative  output  from  the  n-.th  element  is  connected  to  the  output  ele¬ 
ment  of  the  graph -diagram; 

b)  the  positive  outputs  (arrows  with  the  symbol  "+")  of  all  the 
combined  elements  are  connected  either  to  the  first  or  to  the  output 
element  of  the  graph-diagram.  In  the  first  case  the  substitution  of 
the  operator  of  the  corresponding  element  is  termed  ordinary,  in  the 
second  case  it  is  termed  final. 

c)  the  input  element  is  connected  by  an  arrow  to  the  first  com¬ 
bined  (identifier-operator)  element. 

These  conditions  are  necessary  and  sufficient  for  the  graph- 
diagram  which  satisfied  them  to  represent  an  ordinary  normal  algorithm 
rather  than  a  generalized  normal  algorithm.  It  is  easy  to  verify  that 
the  graph-diagram  shown  in  Pig.  2  is  not  a  graph-diagram  of  a  normal 
algorithm  since  it  does  not  satisfy  the  second  of  the  conditions  just 
formulated  (condition  "b" ). 

The  normal  algorithms  are  customarily  represented  not  by  graph- 
diagrams  but  simple  by  the  ordered  set  of  substitutions  of  all  the  op¬ 
erators  of  the  given  algorithm,  termed  the  diagram  of  the  given  algo¬ 
rithm.  Here  the  ordinary  substitutions  are  written,  as  shown  above,  in 


the  form  of  two  words  connected  by  an  arrow  (q1  -*■  q2)  while  the  final 
substitutions  are  designated  by  an  arrow  with  a  dot  (q^  -♦  .  q2). 

The  order  of  performance  of  the  substitutions  is  completely  deter¬ 
mined  after  this  by  the  conditions  "a",  "b"  and  "c".  Actually,  as  a  re¬ 
sult  of  these  conditions  the  arbitrary  i-th  substitution  of  the  algo¬ 
rithm  diagram  must  be  performed  in,  and  only  in,  the  case  when  it  it 
the  first  of  the  applied  substitutions  (all  substitutions  from  the 
1-st  to  the  (i  -  l)-th  not  applied).  The  process  of  performing  the  sub¬ 
stitutions  is  terminated  only  when  none  of  the  substitutions  of  the 
diagram  is  applicable  to  the  word  obtained  or  when  some  final  substitu¬ 
tion  is  performed  (for  the  first  time). 

As  an  example,  let  us  consider  the  operation  of  the  normal  algo¬ 
rithm  A  given  by  the  diagram 

yy*  -*• 
xx->y\ 

yyy  •  *. 

Let  us  assume  that  we  are  given  the  input  word  p  =  xyxxxyy.  The 
first  substitution  of  the  algorithm  A  is  not  applicable  to  this  word, 
in  order  to  apply  the  second  substitution  we  Isolate  the  first  occur¬ 
rence  of  its  left  part  (xx)  in  the  word  p:p  =  xy(xx)xyy.  After  perform¬ 
ance  of  the  second  substitution  of  the  algorithm,  we  obtain  the  word 
Pj  =  xyyxyy,  to  which  the  first  substitution  of  the  algorithm  is  appli¬ 
cable:  p1  =  x(yyx)yy  -♦  xyyy  =  p2»  Only  the  third  substitution  is  appli¬ 
cable  to  the  resulting  word:  p2  =  x(yyy)  -*>  xx  =  p^,  and  since  it  is 
denoted  as  a  final  substitution,  the  word  p^  is  the  final  result  of 
the  action  of  the  algorithm  A  on  the  original  word  jc,  l.e. ,  p^  =  A(p). 

If  the  third  substitution  of  the  algorithm  A  were  not  a  final  sub¬ 
stitution,  then  the  process  of  substitution  could  be  continued  and  in 
place  of  the  word  p^  -  xx  we  would  obtain  the  word  p^  =  y  as  the  re¬ 
sult  of  the  action  of  the  algorithm  on  the  original  word  jd. 
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The  use  of  final  substitutions  in  the  normal  algorithm  diagrams 
along  with  the  ordinary  substitutions  is  necessary  in  order  to  have 
the  possibility  of  realizing  in  such  diagrams  the  arbitrary  construc¬ 
tive  alphabetic  operators,  i. e. ,  those  alphabetic  operators  which  are 
determined  with  the  use  of  a  finite  number  of  rules.  Actually,  any  nor¬ 
mal  algorithm  A  whose  diagram  does  not  contain  a  single  final  operator 
can  terminate  its  operation  only  when  none  of  its  substitutions  is  fur¬ 
ther  applicable.  This  implies  directly  that  repeated  application  of  al¬ 
gorithm  A  to  the  word  A(p)  obtained  as  a  result  of  the  application  to 
any  input  word  jd  cannot  change  this  word.  In  other  words,  the  follow¬ 
ing  identity  relation  (valid  for  any  input  word  jd)  is  satisfied  for 
the  algorithm  A  (see  Markov  [53]) s 

A(A{p))-A{p).  (5) 

By  no  means  every  constructive  alphabetic  operator  satisfies  this 
relation.  An  example  of  an  alphabetic  operator  for  which  relation  (5) 
is  not  valid  is  the  operator  B,  whose  action  on  any  word  jd  amounts  to 
prefixing  some  fixed  letter  x  to  the  left  of  this  word:  B(p)  =  xp. 

Prcm  what  we  have  said  above  it  is  clear  that  this  operator  cannot  be 
realized  by  the  use  of  a  normal  algorithm  whose  diagram  does  not  con¬ 
tain  final  substitutions. 

At  the  same  time  it  is  easy  to  verify  that  this  operator  is  real¬ 
ized  by  the  normal  diagram  consisting  of  the  single  final  substitution 
-+  »x  (or,  what  is  the  same,  e  -*•  »x).  Actually,  as  a  result  of  the  defi¬ 
nition  of  occurrence  taken  above,  an  empty  word  occurs  in  every  word  jd, 
and  its  first  occurrence  will  not  have  a  single  letter  on  its  left.  It 
follows  directly  from  this  that  the  use  of  this  substitution  on  the  ar¬ 
bitrary  word  jd  converts  it  to  the  word  xp. 

It  is  no  less  evident  that  in  the  construction  of  the  theory  of 
normal  algorithms  we  cannot  limit  ourselves  to  only  final  substitutions. 
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Actually,  the  normal  algorithm  whose  diagram  consists  only  of  final 
substitutions  operates  on  each  input  word  £  with  no  more  than  one  of 
these  substitutions,  after  which  the  required  output  word  A(p)  is  ob¬ 
tained  immediately.  In  view  of  the  finiteness  of  the  algorithm  diagram 
the  moduli  of  the  differences  of  the  lengths  of  the  words  jd  and  A(p) 
are  bounded  in  the  aggregate  (for  any  selection  of  the  input  word  jd) 
by  the  same  number  N  (the  maximum  of  the  moduli  of  the  differences  of 
the  lengths  of  the  words  in  the  left  and  right  sides  of  the  substitu¬ 
tions  of  algorithm  A). 

There  do  exist,  however,  simple  constructive  algorithms  for  which 
the  moduli  of  the  differences  of  the  lengths  of  the  input  and  corre¬ 
sponding  output  words  are  not  bounded  in  the  aggregate.  An  example  of 
such  operators  might  be  the  operator  D  for  the  doubling  of  the  input 
words,  whose  action  on  any  input  word  £  is  determined  by  the  equality 
D(p)  =  pp.  From  what  we  have  said  above,  it  is  clear  that  the  repre¬ 
sentation  of  this  operator  in  the  form  of  a  normal  algorithm  whose  dia 
gram  contains  only  final  substitutions  is  obviously  impossible. 

Thus,  if  we  present  to  an  algorithmic  system  based  on  the  use  of 
normal  algorithms  the  requirement  of  universality  (possibility  of  con¬ 
structing  a  normal  algorithm  which  is  equivalent  to  any  a  priori  speci 
fled  algorithm),  then  a  necessary  condition  for  such  universality  is 
the  use  of  both  forms  of  substitutions,  both  final  and  ordinary.  This 
condition  is  also  sufficient,  i. e. ,  we  can  formulate  the  normalization 
principle  (see  Ref.  53)* 

Normalization  principle.  For  any  algorithm  (constructively  given 
alphabetic  representation)  in  the  arbitrary  finite  alphabet  A  we  can 
construct  an  equivalent  normal  algorithm  on  the  alphabet  A. 

The  concept  of  a  normal  algorithm  on  an  alphabet  which  is  used  on 
the  formulation  of  the  normalization  principle  means  the  following.  In 
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many  cases  it  is  not  possible  to  construct  a  normal  algorithm  equiva¬ 
lent  to  a  given  algorithm  (in  the  alphabet  A)  if  we  use  only  letters 
of  the  alphabet  A  in  the  substitutions  of  the  algorithm.  However,  we 
can  construct  the  required  normal  algorithm  by  adding  to  the  alphabet 
A  some  number  of  new  letters  or,  as  we  usually  say,  performing  an  ex¬ 
pansion  of  the  alphabet  A.  In  this  case  it  is  customary  to  say  that 
the  constructed  (normal)  algorithm  is  an  algorithm  on  the  alphabet  A. 
We  agree,  however,  that  in  spite  of  the  expansion  of  the  alphabet  the 
algorithm  will  as  before  be  applied  only  to  words  in  the  original  al¬ 
phabet  A. 

As  shown  by  Markov  [53]  and  Nagornyy  [58] ,  if  we  can  construct 
the  normal  algorithm  equivalent  to  a  given  algorithm  in  the  alphabet  A 
by  Joining  to  the  alphabet  A  some  (possibly  very  large)  finite  number 
of  letters,  then  we  can  construct  its  equivalent  normal  algorithm  by 
adjoining  to  the  alphabet  A  only  a  single  additional  letter. 

It  is  not  possible  to  give  a  rigorous  mathematical  proof  of  the 
normalization  principle,  since  the  concept  of  the  arbitrary  algorithm 
is  not  a  rigorously  defined  mathematical  concept.  Therefore,  we  must 
approach  its  substantiation  just  as  we  approach  the  substantiation  of 
every  law  or  principle  of  natural  science.  The  substantiation  which  we 
can  give  the  normalization  principle  in  this  framework  makes  it  possi¬ 
ble  to  consider  this  principle  credible  to  a  very  high  degree.  We 
shall  indicate  the  basic  processes  of  this  substantiation.  In  order  to 
simplify  the  formulations,  we  shall  agree,  following  Markov  [ 53 ] >  to 
term  a  particular  algorithm  normalizable  if  we  can  construct  its  equiv¬ 
alent  normal  algorithm  (using,  possibly,  expansion  of  the  alphabet) 
and  term  it  unnormalizable  otherwise.  We  can  now  state  the  normaliza¬ 
tion  principle  in  a  somewhat  altered  form. 

All  algorithms  are  normalizable. 
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The  validity  of  this  principle  is  based  first  of  all  on  the  fact 
that  all  the  algorithms  known  at  the  present  time  are  normalizable. 
Since  in  the  course  of  the  long  history  of  the  development  of  the  ex¬ 
act  sciences  a  considerable  number  of  different  algorithms  have  been 
devised,  this  statement  is  convincing  in  itself. 

In  actuality  it  is  even  more  convincing.  We  can  show  that  all  the 
methods  known  at  the  present  time  for  the  composition  of  algorithms 
which  make  it  possible  to  construct  new  algorithms  from  the  already 
known  ones  do  not  go  beyond  the  limits  of  the  class  of  normalizable 
algorithms.  In  other  words,  if  the  original  algorithms  were  normaliza¬ 
ble,  then  any  compositions  of  these  algorithms  (among  the  number  of 
forms  of  compositions  known  at  the  present  time)  will  also  be  normal¬ 
izable.  This  implies  that  for  the  construction  of  an  example  of  an  un- 
normalizable  algorithms  it  is  necessary  to  use  techniques  which  are 
qualitatively  different  from  everything  the  mathematician  has  encoun¬ 
tered  up  till  now. 

However  this  is  not  all.  A  whole  series  of  scientists  have  under¬ 
taken  special  attempts  to  construct  algorithms  of  a  more  general  form 
and  all  these  attempts  have  not  been  carried  beyond  the  limits  of  the 
class  of  normalizable  algorithms.  We  shall  consider  one  of  these  at¬ 
tempts  (the  algorithmic  scheme  of  Kolmogorov-Uspenskiy)  below.  The 
failure  of  these  attempts  is  in  itself  the  most  striking  evidence  in 
favor  of  the  validity  of  the  normalization  principle. 

Thus  the  normalization  principle  should  be  considered  sufficient¬ 
ly  substantiated,  although  this  substantiation  does  not  exclude  com¬ 
pletely  the  possibility  of  its  refutation  in  the  future  (by  construc¬ 
tion  of  an  example  of  an  unnormal iz able  algorithm).  In  any  case,  the 
normalizable  algorithms  encompass  a  significant  portion  of  the  algo¬ 
rithms  (if  not  all)  and  therefore  the  system  of  normal  algorithms  can 
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be  considered  in  practice  to  be  a  universal  algorithmic  system. 

Let  us  consider  now  some  of  the  common  forms  of  compositions  of 
algorithms  which  were  mentioned  above.  We  shall  define  not  the  composi¬ 
tion  of  the  algorithms  themselves,  but  the  composition  of  their  corre¬ 
sponding  alphabetical  representations,  however,  as  remarked  above,  the 
possibility  of  normalization  of  the  result  of  the  composition  of  the 
normal  algorithms  makes  it  possible  (at  least  in  the  class  of  normal 
algorithms)  to  extend  the  definition  of  the  composition  of  the  repre¬ 
sentations  to  the  composition  of  the  algorithms  themselves. 

One  of  the  most  common  forms  of  composition  of  algorithms  (repre¬ 
sentations)  is  the  superposition  of  algorithms.  In  the  superposition 
of  the  two  algorithms  A  and  B  the  output  word  of  the  first  algorithm 
(A)  is  considered  as  the  input  word  of  the  second  algorithm  (B) ,  so 
that  the  result  of  the  superposition  of  the  algorithms  A  and  B  can  be 
represented  in  the  form  D(p)  =  B(A(p)).  This  definition  extends  to  the 
superposition  of  any  finite  number  of  algorithms. 

A  superposition  of  generalized  normal  algorithms  can  be  consid¬ 
ered  an  a  generalized  normal  algorithms.  For  this  it  is  sufficient 
that  the  output  element  of  the  graph-diagram  of  each  preceding  algo¬ 
rithm  be  combined  with  the  input  element  of  the  succeeding  algorithm. 
The  normalization  of  a  superposition  of  normal  algorithms  requires  con¬ 
siderable  skill,  however  it  too  can  always  be  accomplished  [ 53 ] - 

We  shall  point  out  some  other  forms  of  compositions  of  algorithms. 

The  union  of  the  algorithms  A  and  B  in  the  same  alphabet  x  is  the 
term  given  to  the  algorithm  C  in  the  same  alphabet  which  transforms 
any  input  word  jd  contained  in  the  intersection  of  the  domains  of  defi¬ 
nition  of  the  algorithms  A  and  B  into  the  words  A(p)  and  B(p)  written 
side  by  side;  this  algorithm  is  considered  undefined  on  all  the  remain¬ 
ing  input  words. 


A  ramification  of  algorithms  is  a  composition  of  the  three  algo¬ 


rithms  A,  B  and  C.  Designating  the  result  of  this  composition  by  D,  we 
shall  consider  that  the  domain  of  definition  of  the  algorithm  D  coin¬ 
cides  with  the  intersection  of  the  domains  of  definition  of  all  three 
algorithms  A,  B  and  C,  and  that  for  any  word  p  from  this  intersection 
D(p)  =  A(p)  if  C(p)  =  e,  and  D(p)  =  B(p)  if  C(p)  r  e. 

A  repetition  (iteration)  is  the  composition  of  the  two  algorithms 
A  and  B.  Designating  the  result  of  this  composition  by  P,  we  define 
that  for  any  input  word  the  corresponding  output  word  P(q)  is  deter¬ 
mined  by  the  following  condition:  there  exists  such  a  series  of  words 

q  =  q0'  ql'  q2*  •••■’  qn  =  p(s),  that  for  a11  1  =  1*  ncli  = 

=  A(q1^>1),  for  all  i  *=  1,  2,...,  n  -  1  Bfq^)  /  e,  and  B(qn)  =  e.  In 

other  words,  the  algorithm  A  is  applied  sequentially  several  times 
until  a  word  is  obtained  which  is  transformed  by  the  algorithm  B  into 
the  empty  word  e  (we  can,  of  course,  select  any  other  fixed  ford  rath¬ 
er  than  the  empty  word). 

All  the  methods  described  for  the  composition  of  the  normal  algo¬ 
rithms  lead  to  normalizable  algorithms  [53]. 

Of  very  great  importance  for  the  normal  algorithms,  just  as  for 
every  universal  algorithmic  system,  is  the  problem  of  the  construction 
of  the  so-called  universal  algorithm.  Let  us  consider  the  universal 
algorithm  in  application  to  the  normal  algorithms. 

Let  us  be  required  to  construct  a  normal  algorithm  which  will  per 
form  the  operation  of  any  normal  algorithm  if  we  are  given  the  diagram 
(substitution  set)  of  this  latter  algorithm. 

The  exact  formulation  of  the  problem  on  the  universal  algorithm 
can  be  accomplished  by  various  methods.  We  shall  describe  one  of  the 
most  natural  methods  for  such  a  formulation.  To  do  this  we  first  of 
all  fix  some  standard  alphabet  X  (for  example,  binary).  For  all  other 
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possible  alphabets  we  fix  some  definite  method  of  coding  the  letters 
of  these  alphabets  in  the  selected  standard  alphabet.  In  the  case  of 
the  binary  standard  alphabet  this  can  be  done,  for  example,  as  follows: 
the  letters  of  any  given  alphabet  are  numbered  sequentially  using  the 
natural  numbers,  after  which  the  i-th  letter  is  assigned  the  binary 
code,  beginning  and  ending  with  zero  and  having  between  these  zeros  ex¬ 
actly  i  ones.  If  the  total  number  of  letters  in  the  given  alphabet  is 
equal  to  n,  then  we  introduce  also  the  additional  ((n  +  l)-st,  (n  +  2)- 
nd,  etc. )  letters  for  the  designations  of  the  symbols  used  in  the  dia¬ 
grams  of  the  normal  algorithms  (arrows,  dots,  separation  sign  between 
formulas)  and  also  for  the  designation  of  the  special  end  sign  which 
stands  at  the  beginning  and  end  of  the  algorithm  diagram. 

After  writing  the  algorithm  diagram  with  a  single  word  and  coding 
the  letters  of  this  word  by  the  method  just  described,  we  obtain  a 
word  in  the  standard  alphabet,  which  is  termed  the  transform  of  the 
given  algorithm.  For  example,  for  the  normal  algorithm  given  by  the  di¬ 
agram 

xy-+x 
y  -*> 

the  transform  Au  of  the  algorithm  A  in  the  binary  alphabet  can  be  ob¬ 
tained  as  follows:  we  fix  the  numberation  of  the  letters,  considering 
x  to  be  the  first,  ^  the  second,  the  arrow  to  be  the  third,  the  dot  to 
be  the  fourth,  the  separation  symbol  to  be  the  fifth,  and  the  end  sym¬ 
bol  to  be  the  sixth  letter.  Then  the  transform  Au  of  the  algorithm  A 
is  written  as:  060  010  020  030  010  050  020  030  040  060.  Here,  for  brev¬ 
ity,  in  place  of  writing  out  in  a  row  any  positive  number  n  of  ones  we 
have  written  this  number  n  itself. 

Along  with  the  transform  of  the  algorithm  A,  there  can  also  be  ob¬ 
tained  by  use  of  the  coding  in  the  standard  alphabet  I  described  above 
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the  transform  pu  of  any  input  word  jo  of  this  algorithm. 

The  following  theorem  on  the  universal  normal  algorithm  is  valid 
(see  Markov  [53] )• 

There  exists  such  a  normal  algorithm  U,  termed  a  universal  normal 
algorithm,  which  for  any  normal  algorithm  A  and  any  input  word  jd  from 
the  domain  of  definition  of  this  latter  algorithm  transforms  the  word 
Aupu,  obtained  by  suffixing  the  transform  of  the  word  p  to  the  trans¬ 
form  of  the  algorithm  A,  into  the  word  which  is  the  transform  of  the 
corresponding  output  word  A(P)  into  which  the  algorithm  A  transforms 
the  word  jd.  If*  however,  the  word  jd  is  chosen  so  that  the  algorithm  A 
is  not  applicable  to  it,  then  the  universal  algorithm  U  is  not  appli¬ 
cable  to  the  word  Aupu. 

This  theorem  is  of  tremendous  value,  since  it  implies  the  possi¬ 
bility  of  the  construction  of  a  machine  which  can  perform  the  opera¬ 
tion  of  any  normal  algorithm,  which  means,  in  view  of  the  normaliza¬ 
tion  principle,  the  operation  of  any  arbitrary  algorithm.  For  this 
purpose  it  is  sufficient  to  insert  into  the  machine  a  program,  i.e., 
the  transform  of  that  normal  (normalized)  algorithm  whose  operation 
the  machine  is  to  perform. 

However,  although  in  principle  the  possibility  has  been  proved  of 
the  normalization  for  all  the  algorithms  known  at  the  present  time, 
the  actual  performance  of  the  normalization  is  a  very  serious  matter 
even  for  the  relatively  simple  algorithms  (the  algorit>jn  for  the  multi¬ 
plication  of  two  whole  numbers,  for  example).  This  means  that  the  pro¬ 
gramming  for  a  machine  simulating  the  universal  normal  algorithm  would 
be  excessively  unwieldy  and  impractical.  Therefore,  in  practice  the 
machines  which  make  possible  the  realization  of  the  operation  of  any 
algorithm  are  designed  on  the  basis  of  the  use  of  other  algorithmic 
systems  which  differ  from  the  system  of  the  normal  algorithms.  These 
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systems  are  described  in  Chapter  5. 

§3*  THE  KOLMOGOROV -USPENSKL Y  ALGORITHMIC  DIAGRAM 

The  present  section  describes  the  method  suggested  by  Kolmogorov 
and  Uspenski  [43]  for  the  determination  of  algorithms  of  the  most  gen¬ 
eral  form.  For  the  construction  of  the  corresponding  algorithmic  dia¬ 
gram  they  choose  the  method  which  is  based  only  on  those  properties 
which  are  without  question  inherent  to  any  algorithmic  diagram  and 
which  will  realize  these  properties  in  particular  specific  forms  with¬ 
out  permitting  any  loss  of  generality  in  doing  so. 

Ii  the  construction  of  such  a  generalized  algorithmic  diagram  it 
is  useful  to  picture  as  a  visualizable  model  a  man  who  is  performing 
the  computation  or  other  processing  of  information  in  accordance  with 
a  particular  precisely  prescribed  system  of  rules.  The  man  performs 
uhe  role  of  information  converter,  while  the  converted  information  it¬ 
self  .3  located  outside  of  the  man.  We  shall  assume  for  definiteness 
that  this  information  is  written  on  sheets  of  paper,  and  that  the  man 
has  at  his  disposal  an  unlimited  supply  of  clean  sheets  and  an  unlim¬ 
ited  reserve  of  space  for  storage  of  filled -out  sheets.  The  transforma¬ 
tion  of  the  information  realized  by  the  man  is  broken  down  into  indi¬ 
vidual  discrete  steps.  At  each  such  step  the  man  surveys  some  number 
of  completed  sheets  and,  depending  on  the  contents  of  these  records, 
using  a  strictly  defined  and  time -invariant  system  of  rules  located  in 
his  memory,  he  performs  certain  alterations  in  the  reviewed  informa¬ 
tion.  These  alterations  may  be  of  three  forms:  erasure  (annihilation) 
of  the  entire  reviewed  information  or  some  portion  of  it,  recording 
on  the  reviewed  sheets  of  new  information,  alteration  of  the  ensemble 
of  reviewed  sheets. 

At  first  glance  it  seems  that  the  requirement  for  the  invariance 
in  the  system  of  rules  used  for  the  performance  of  the  processing  of 
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the  Information  significantly  narrows  the  range  of  problems  considered 
in  comparison  with  the  problems  which  can  in  actuality  be  solved  by 
man,  since  man  is  capable  of  altering  the  rules  in  the  course  of  the 
operation.  In  actuality  this  limitation  is  not  significant,  since  the 
nature  of  the  alteration  of  the  information  at  each  step  of  the  proc¬ 
essing  depends  not  only  on  the  rules  of  the  transformation  but  also  on 
this  information  itself.  In  this  connection  it  is  possible  in  case  of 
necessity  to  vary  the  nature  of  the  information  transformation  wi^n 
the  course  of  time,  to  introduce  corresponding  changes  in  the  informa¬ 
tion  itself,  and  not  in  the  rules  stored  in  the  memory  of  the  proces¬ 
sor,  in  other  words,  to  write  down  in  the  rules  on  the  sheets  of  paper 
the  required  alterations  and  not  to  memorize  them. 

An  absolutely  necessary  limitation  in  the  design  of  any  algorith¬ 
mic  system  is  the  capability  of  the  information  processor  to  absorb  at 
any  given  instant  of  time  only  a  limited  quantity  of  information.  If 
the  total  volume  of  the  material  being  processed  exceeds  the  volume  of 
this  active  zone  of  the  processor,  then  the  information  must  be 
brought  into  the  processing  gradually,  step  by  step. 

After  these  preliminary  remarks  we  turn  directly  to  the  descrip¬ 
tion  of  the  Kolmogorov-Jspenskiy  diagram.  The  information  in  this  dia¬ 
gram,  as  in  general  in  the  case  of  the  alphabetic  conversions,  is  writ¬ 
ten  with  the  aid  of  a  finite  number  of  symbols,  letters,  which  we 
shall  designate  as  TQ,  ...,  Tn.  To  achieve  the  greatest  possible 

generality,  we  shall  also  establish  certain  relations  between  the  sym¬ 
bols,  these  relations  belonging  to  one  of  the  types  R1,  Rg,  ...,  Rm. 

For  each  type  of  relation  Ri  we  fix  the  number  of  related  symbols 
(letters).  We  designate  by  K  the  maximal  number  among  the  numbers  k^, 

. . .  ,  k  .  The  relations  between  the  symbols  are  introduced  in  order  to 

9  m 

take  account  of  the  case  of  complex  letters  which  designate,  for 


example,  entire  pharases  in  ordinary  language.  In  that  case  the  compo¬ 
sition  of  the  phrase  (letter)  may  include  indications  of  the  relative 
positioning  of  information  (other  letters)  which  has  direct  relation 
with  the  letter  in  question  (say,  information  which  must  be  brought  in¬ 
to  consideration  in  the  following  step  of  the  algorithm).  The  limiting 
of  the  number  of  related  symbols  depends  on  the  boundedness  of  the  in¬ 
formation  contained  in  each  letter  (otherwise  the  letter  cannot  be  con¬ 
tained  entirely  in  the  active  zone  and  it  must  be  divided  into  individ¬ 
ual  portions). 


Let  us  assume  that  all  the  relations 
in  which  any  given  letter  can  occur  are  or¬ 
dered  in  some  way  and  numbered,  and  that 
the  cotal  number  of  such  relations  is 
bounded  by  the  same  number  a.  We  shall  use 
circles  to  designate  the  letters,  intro¬ 
ducing  when  necessary  numeration  of  these 
circles  with  numerals  written  adjacent  to 
the  corresponding  circles.  These  numerals  have  no  relations  to  the 
type  of  symbol  (letter)  designated  by  the  given  circle.  When  necessary, 
the  symbol  of  the  corresponding  letter  io  written  inside  the  circle 
which  represents  it. 

Any  relationship  between  sumbols  (letters)  can  now  be  represented 
as  shown  in  Pig.  3» 

The  subscripts  of  p^,  pg,  . . .  ,  pk  on  this  figure  show  the  posi¬ 
tion  occupied  by  the  relation  in  question  in  the  ordered  set  of  rela¬ 
tions  for  the  corresponding  (designated  by  the  numbered  circles)  let- 

\ 

ter.  These  subscripts  (regardless  of  the  choice  of  the  letters  and 
the  form  of  the  relation  R)  can  take  only  the  values  1,  2,  3,  ...,  s. 


Fig.  3. 
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We  can  considerably  simplify  the  writing  of  the  information  in 
this  diagram  by  adding  to  the  number  of  letters  TQ,  T^,  . ..,  Tr  s  +  K 
+  m  letters:  j3  letters  for  the  designation  of  the  numbers  of  relations 
in  any  given  element  (squares  in  Pig.  3)>  K  letters  for  the  designa¬ 
tion  of  the  numbers  of  relations  with  the  letters  for  any  given  rela¬ 
tion  R  (triangles  in  Pig.  3)j  and  m  letters  for  the  designation  of  the 
R^  Rg,  . ..,  Rm  relations  themselves.  If  we  denote  all  the  new  letters 
by  circles,  then  the  information  takes  the  form  of  a  set  of  circles 
connected  between  one  another  by  paired  bonds.  Then  tnere  is  no  re¬ 
quirement  for  any  special  numeration  for  the  order  of  occurrence  of  a 
letter  in  particular  relations,  since,  as  shown  in  Pig.  3,  all  the  let¬ 
ters  related  with  any  single  letter  will  inevitably  be  different. 
Thereby  the  relations  in  which  a  given  symbol  (letter)  occurs  are  num¬ 
bered  automatically --by  the  numbers  of  the  sumbols  (letters)  with 
which  the  given  symbol  is  related. 

Thus,  finally,  the  information  in  the  written  algorithmic  diagram 
is  represented  by  an  arbitrary  finite  set  M  whose  elements  are  the 
fixed  letters  TQ,  T^,  ...,  T^  (N  >  l)  where  in  the  set  M  each  of  the 
letters  T2,  T^,  ...,  TN  can  occur  any  number  of  times,  and.  In  addi¬ 
tion,  in  the  set  there  occurs  each  time  one  and  only  one  of  the  let¬ 
ters  Tq  or  T^.  On  this  set  there  is  established  a  paired  relation  (cer¬ 
tain  letters  "Join"  pairwise  with  one  another)  so  that  the  following 
condition  "a"  is  satisfied:  all  the  letters  connected  with  any  single 
letter  of  the  set  M  are  pairwise  different. 

In  other  words,  the  information  is  in  the  form  of  some  one -dimen¬ 
sional  complex  (linear  undirected  graph),  whose  vertices  (designated 
by  the  circles)  are  identified  with  the  letters  TQ,  T^,  ...,  TN  and 
the  (undirected)  lines  connecting  certain  pairs  of  vertices  are  identi¬ 
fied  with  the  paired  relations  between  the  letters  described  above. 
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The  requirement  for  the  occurrence  in  the  complex  in  question 
(the  set  M)  of  one  and  only  one  vertex  identified  either  with  the  let¬ 
ter  TQ  or  with  the  letter  T^  is  associated  with  the  necessity  for  the 
establishment  of  the  reference  point  (center  of  the  active  zone)  of 
the  information,  and  one  of  these  letters  (we  assume  that  it  is  the 
letter  TQ)  is  required  for  the  compleses  designating  the  information 
whose  processing  is  not  yet  completed,  and  the  other  (in  the  present 
case  the  letter  T1)  is  required  for  the  complexes  designating  the  ter¬ 
minal  information  from  which  the  final  results  of  the  operation  of  the 
algorithm  must  be  extracted. 

The  vertex  of  the  informational  complex  S,  which  is  identified 
with  the  letter  Tq  or  T^,  is  termed  the  initial  vertex  of  the  complex. 
The  active  zone  of  the  complex  S  is  the  subcomplex  of  the  complex  S 
which  consists  of  the  vertices  (letters)  and  the  lines  (relations)  be¬ 
longing  to  the  chains  of  length  X  <  P  containing  the  initial  vertex, 
where  P  is  a  number  which  is  determinate,  fixed  for  the  given  algo¬ 
rithm.  Here  and  hereafter  we  use  the  term  chain  to  designate  any  fi¬ 
nite  sequence  of  vertices  B^,  Bg,  ...,  Bp  such  that  any  two  neighbor¬ 
ing  vertices  in  this  sequence  are  connected  by  lines;  the  number  of 
all  these  vertices  (equal  to  p  —  l)  is  termed  the  length  of  the  chain, 
and  these  lines  themselves  are  also  Included  in  the  chain  in  question. 

The  ensemble  of  all  the  vertices  of  the  active  zone  of  the  infor¬ 
mation  complex  which  are  connected  with  the  initial  vertex  by  chains 
of  length  P  and  are  not  connected  with  it  by  chains  of  lesser  length 
is  termed  the  boundary  of  this  zone.  The  complex  is  called  bound  if  an- 
y  two  of  its  vertices  can  be  connected  by  a  chain.  The  ensemble  of  ver¬ 
tices  and  lines  lying  beyond  the  limits  of  the  active  zone  of  the  com¬ 
plex  S  is  termed  the  external  portion  of  the  complex. 
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Two  complexes  are  termed  mutually  isomorphic  if  between  their  ver 
tlces  we  can  establish  mutually  single-valued  correspondence ,  where 
the  corresponding  vertices  are  designated  by  the  Identical  letters  Tq, 
T^,  . ..,  T^,  and  corresponding  pairs  of  vertices  are  either  simultane¬ 
ously  connected  or  simultaneously  not  connected  to  one  another.  Mutual 
ly  isomorphic  complexes  are  in  essence  Identical,  and  differ  perhaps 
only  in  the  method  of  their  representation  (position  of  the  vertices 
on  the  plane,  for  example). 

In  view  of  the  boundedness  of  the  total  number  of  vertices  in  the 
active  zone  of  the  information  complex  of  any  given  algorithm  and  the 
boundedness  of  the  number  of  letters  Tq,  T^,  ...,  TN  for  any  given  al¬ 
gorithm  A,  there  exists  only  a  finite  number  of  different  (pairwise 
nonisomorphic)  active  zones  U-^,  U2,  . ..,  Ur.  Starting  from  this,  the 
rules  for  their  processing  can  be  given  by  the  simple  correspondence 
table  -♦  W1  (i  =1,  2,  . . .  ,  r). 

The  complexes  appearing  in  the  right  side  of  this  table  must  have 

subcomplexes  which  are  isomorphic  to  the  boundaries  of  the  correspond - 

\ 

ing  active  zones  U^,  and  these  isomorphisms  must  be  fixed  once  and  for 
all.  In  other  words,  to  each  vertex  lying  on  the  boundary  L(Ui)  of  the 
active  zone  there  must  be  associated  a  completely  determined  vertex 
of  the  complex  W1  (i  =  1,  2,  . . .  ,  r).  Each  of  the  complexes  W.^  must 
satisfy  all  the  conditions  imposed  above  on  the  information  complexes; 
in  particular,  it  must  have  one  and  only  one  initial  vertex,  designat¬ 
ed  by  the  letter  Tq  or  by  the  letter  T^. 

With  the  aid  of  the  constructed  correspondence  table,  we  deter¬ 
mine  the  operator  which  performs  the  direct  processing  of  the  infor 
mation  complex  at  each  step  of  the  operation  of  the  given  algorithm  A. 
In  the  considered  information  cemplex  S  (initial  and  Intermediate),  we 
find  the  initial  vertex.  Drawing  from  it  all  possible  chains  of  length 
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P,  we  construct  the  active  zone  and  determine  its  boundary  L(U). 

Further,  we  find  that  (single)  active  zone  from  the  left  side  of 
the  correspondence  table  which  is  isomorphic  to  the  found  active  zone 
U.  As  a  result  of  the  properties  defined  above  of  the  information  com¬ 
plexes  (in  particular  the  property  "a")  and  the  connectedness  of  the 
two  complexes  and  U  with  one  another,  only  one  isomorphism  is  possi¬ 
ble.  This  makes  possible  unique  identification  of  the  vertices  lying 
on  the  boundary  L(U)  of  the  active  zone  U  with  the  corresponding  ver¬ 
tices,  for  the  isomorphic  case,  lying  on  the  boundary  of  the  ac¬ 

tive  zone  and,  using  the  identification  of  the  vertices  employed  in 
the  correspondence  table,  also  with  certain  vertices  of  the  complex  Ui< 

Now  it  is  easy  to  remove  all  the  interior,  i.e. ,  not  lying  on  the 
boundary  L(u),  portion  of  the  active  zone  U  and  replace  it  by  the  sub¬ 
complex  of  the  complex  which  includes  all  the  elements  of  this 

complex  except  its  vertices  which  were  identified  earlier.  Thus,  we 
"insert”  into  the  information  complex  in  question  the  new  complex 
in  place  of  the  internal  portion  of  its  active  zone  while  retaining  un¬ 
changed  the  boundaries  of  the  active  zone. 

Since  in  the  complex  W'^  the  initial  vertex  occupies  a  new  posi¬ 
tion  with  relation  to  the  boundary  of  the  previous  active  zone,  the 
new  active  zone,  determined  after  the  insertion,  will  have  a  different 
boundary.  The  new  information  complex  S’  obtained  after  such  an  inser¬ 
tion  then  will  be  the  result  of  the  application  of  the  direct  proces¬ 
sing  operator  of  the  algorithm  A  in  question  to  the  original  infor¬ 
mation  complex  S.  The  direct  processing  operator  is  applied  to  the 
resulting  information  complex  until  obtaining  a  complex  whose  initial 
vertex  is  designated  by  the  letter  T^  and  not  by  the  letter  Tq. 

Such  a  complex  is  termed  a  terminal  complex  and  its  maximal  bound 


subcomplex,  containing  the  initial  vertex  T1>  iS  considered  to  be  the 
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solution,  i.e.  ,  the  information  complex  obtained  as  the  result  of  the 


action  of  the  algorithm  A  on  the  initial  (input)  information  complex 
Sq.  If,  however,  the  algorithm  continues  operation  without  end  without 
obtaining  a  terminal  complex  at  any  step,  then.  Just  as  in  the  case  of 
the  normal  algorithms,  we  take  it  that  the  algorithm  in  question  is 
not  applicable  to  the  given  initial  complex  SQ. 

We  can  expand  the  definition  of  the  algorithm  so  as  to  permit  in 
the  right  side  of  the  table  correspondences  of  a  complex  without  an  in¬ 
itial  vertex.  The  application  of  the  substitution  with  such  a  right 
side  leads  to  natural  termination  of  the  algorithmic  process,  since 
the  determination  of  the  active  zone  and  the  further  substitution  be¬ 
come  impossible. 

However,  since  the  terminal  complex  (in  the  sense  defined  above) 
does  not  appear,  again  in  this  case  the  algorithmic  process  must  be 
considered  to  have  terminated  without  result  and  the  algorithm  is  con¬ 
sidered  inapplicable  to  the  corresponding  initial  information  complex. 

Still  another  type  of  unsuccessful  termination  of  the  algorithmic 
process  is  possible  in  which  the  correspondence  table  does  not  contain 
all  forms  of  active  zones  which  are  possible  for  the  given  algorithm. 

In  the  case  when  the  information  complex  reaches  a  state  in  which  al¬ 
though  there  is  an  initial  vertex  designated  by  the  letter  TQ  none  of 
the  substitutions  of  the  correspondence  table  are  applicable.  It  is  al¬ 
so  considered  that  the  algorithm  is  not  applicable  to  the  correspond¬ 
ing  Initial  Information  complex. 

We  must  make  still  one  more  remark  on  the  nature  of  the  substitu¬ 
tions  in  the  correspondence  table.  If  special  measures  are  not  taken, 
as  a  result  of  the  substitutions  the  condition  "a"  introduced  above 
may  be  violated;  this  condition  must  be  satisfied  by  all  the  Informa¬ 
tion  complexes  we  are  considering.  In  order  to  avoid  such  a  distortion 
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of  the  information,  it  is  clearly  sufficient  to  assume  that  any  vertex 
of  the  arbitrary  complex  from  the  right  side  of  the  correspondence 
table,  which  in  the  process  of  "insertion"  is  identified  with  some  ver¬ 
tex  of  the  boundary  of  the  active  zone  in  the  complex  W^,  can  be  con¬ 
nected  by  lines  only  with  the  initial  vertex  and  with  the  vertices  des¬ 
ignated  by  the  same  letters  as  the  vertices  with  which  there  is  con¬ 
nected  by  lines  in  the  complex  the  vertex  corresponding  to  the  ver¬ 
tex  C[. 

This  condition  (we  term  it  "b")  does  not  violate  the  generality 
of  our  considerations.  The  boundary  used  in  performing  the  "insertion" 
operation  is  defined  quite  arbitrarily.  If  we  included  in  the  boundary 
not  only  those  vertices  which  are  removed  from  the  initial  vertex  by 
the  distance  P  (connected  with  it  by  chains  of  length  P  but  not  by 
chains  of  lesser  length)  but  also  the  vertices  which  are  removed  from 
it  by  the  distance  P  —  1,  then,  establishing  the  isomorphism  of  the 
boundaries  in  the  compleses  Ui  and  we  would  obtain,  as  it  is  not 
difficult  to  see,  a  stronger  limitation  on  the  correspondence  table 
than  the  limitation  imposed  by  the  condition  "b". 

Careful  analysis  of  the  description  of  the  Kolmogorov -Uspenskly 
algorithmic  diagram  shows  that  in  form  this  diagram  to  a  very  signifi¬ 
cant  degree  is  reminiscent  of  the  operation  actually  performed  by  a 
man  when  he  processes  information  supplied  to  him  externally  in  accord¬ 
ance  with  the  particular  rules  of  an  algorithm  which  he  has  memorized. 
The  developers  of  this  diagram  took  special  measures  not  to  lose  gener¬ 
ality  in  the  nature  of  the  transformation  performed.  Nevertheless, 
they  demonstrated  that  the  diagram  which  they  described  gives  the  pos¬ 
sibility  of  constructing  only  normalizable  algorithms.  This  result  can 
be  considered  confirmation  of  the  normalization  principle  formulated 
in  §2. 
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§4.  OT'  HEORETICAL  ALGORITHMIC  SYSTEMS 

L-^oorically  the  first  algorithmic  system  which  received  fairly 
\ 

complete  and  thorough  development  was  the  system  based  on  the  use  of 
constructively  determinate  arithmetic  (integral)  functions  which  were 
given  the  name  recursive  functions.  The  use  of  these  functions  in  the 
theory  of  algorithms  is  based  on  the  idea  of  numeration  of  the  words 
in  any  alphabet  by  means  of  the  sequential  natural  numbers.  This  numer¬ 
ation  can  be  accomplished  most  simply  by  arranging  the  words  in  in¬ 
creasing  order  of  their  lengths,  and  arranging  words  having  the  same 

't 

length  in  an  arbitrary  (lexicographic,  for  example)  order. 

After  numeration  of  the  input  and  output  words  in  an  arbitrary  al¬ 
phabetic  operator,  this  operator  is  transformed  into  the  operator  y  = 

=  f(x)  in  which  both  the  argument  x  and  the  function  ^  Itself  tako  non¬ 
negative  integral  values.  The  function  f(x),  of  course,  can  not  be  de¬ 
fined  for  all  values  of  the  argument  x  but  only  for  certain  values  of 
x  which  constitute  the  domain  of  definition  of  this  function.  Such  par¬ 
tially  defined  integral  and  shole -valued  functions  are  usually  termed 
arithmetic  functions  for  brevity. 

Among  the  arithmetic  functions  we  separate  the  following  particu¬ 
larly  simple  functions  which  we  shall  term  elementary  arithmetic  func¬ 
tions  :  the  function  identically  equal  to^  zero  (defined  for  all  whole 
nonnegative  values  of  the  arguments);  the  Identity  functions  f(x^)  = 

=  Xj^,  which  repeat  the  values  of  their  arguments;  the  direct  succes¬ 
sion  function  f(x)  =  x  +  1,  which  also  defined  for  all  whole  nonnega¬ 
tive  values  of  its  argument. 

Using  as  original  functions  the  elementary  arithmetic  functions 
just  listed,  we  can  with  the  aid  of  a  small  number  of  general  construc¬ 
tive  techniques  construct  ever  more  and  more  complex  arithmetic  func¬ 
tions.  In  the  theory  of  recursive  (constructive  arithmetic)  functions 
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three  operations  are  of  particularly  great  importance:  superposition. 
primitive  recursion  and  least  root  operations. 

The  operation  of  superposition  of  functions  involves  the  substitu¬ 
tion  of  some  arithmetic  functions  in  place  of  the  arguments  of  other 

) 

arithmetic  functions.  Thus,  from  the  already  known  functions  v.e  can 
construct  new  arithmetic  functions.  For  example,  performing  the  super¬ 
position  of  the  functions  f(x)  =  0  and  g(x)  =  x  +  1,  we  arrive  at  the 
function  h(x)  *=  1.  With  the  superposition  of  the  function  g(x)  with  it¬ 
self  there  appears  the  function  p(x)  *  x  +  2,  etc. 

The  operation  of  primitive  recursion  makes  it  possible  to  con¬ 
struct  an  n -place  arithmetic  function  (function  of  n  arguments)  from 
two  given  functions,  one  of  which  is  (n  —  l) -place,  and  the  other  is 
(n  +  l) -place,  the  method  of  this  construction  is  determined  by  the 
following  two  relations: 

f  (X  >  .  .  .  .  0)  (*l»  >•■)  (  6  ) 

/  (x„  xn  +  1)  =*h  (*|,  x, . xn,  y),  (7) 

where  y  =  ffx.^,  ...,  xn_i  >  x^ ;  f  is  the  function  being  determined 
and  £  and  h  are  the  given  functions. 

For  a  proper  understanding  of  the  operation  of  primitive  recur¬ 
sion  we  must  note  that  every  function  of  a  smaller  number  of  variables 
can  be  considered  as  a  function  of  any  larger  number  of  variables.  In 
particular,  constant  functions,  which  it  is  natural  to  consider  as 
functions  of  a  zero  argument,  can  if  desired  by  considered  as  func¬ 
tions  of  any  finite  number  of  arguments. 

As  an  example,  let  us  consider  how  the  operation  of  primitive  re¬ 
cursion  is  applied  to  construct  from  the  elementary  arithmetic  func¬ 
tions  the  two-place  summation  function  f(x,y)  =  x  +  y.  This  function 
is  determined  with  the  aid  of  the  identity  function  g(x)  =  x  and  the 
direct  succession  function  h(x)  =  x  4-  1 
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/(*<  0)  =  *  =  *(*); 

f(x,y+  1)  =  (*  +  {/)+  1  =h(J{x  y)). 

We  can  construct  similarly  the  product,  exponential,  power  and 
other  widely  known  arithmetic  functions. 

The  functions  which  can  be  constructed  from  the  elementary  arith¬ 
metic  functions  using  the  operations  of  superposition  and  primitive  re 
cursion  any  (finite)  number  of  times  in  any  sequence  are  termed  the 
primitively  recursive  functions. 

The  majority  of  the  arithmetic  functions  are  primitively  recur¬ 
sive  functions.  Nevertheless  the  primitively  recursive  functions  do 
not  include  all  the  arithmetic  functions  which  can  be  defined  construe 
tively.  In  the  construction  of  all  these  functions  use  is  made  of  oth¬ 
er  operations,  in  particular  the  least  root  operation. 

The  least  root  operation  makes  it  possible  to  determine  a  new  a- 
rithmetic  function  f(x^,  xR)  of  n  variables  with  the  aid  of  the 

previously  constructed  arithmetic  function  g(x.^,  ...,  x  ,  y)  of  n  +  1 
variables.  For  any  given  set  of  values  of  the  variables  x^  =  a^,  ..., 
xn  «=  an  as  the  corresponding  value  fCo^,  a 2,  ...,  an)  of  the  function 
being  determined  f(x^,  x2,  . ..,  xn)  we  take  the  least  integral  nonnega 
tive  root  y  =  p  of  the  equation  g(a^,  a^,  y)  =  0.  In  the  case  of 

nonexistence  of  integral  nonnegative  roots  of  this  equation,  the  func¬ 
tion  ffx-^,  Xg,  . ..,  xn)  is  considered  indeterminate  for  the  correspond 
ing  set  of  values  of  the  variables.  Usually  it  is  also  presumed  that 
the  function  ffx.^,  Xg,  ...,  x^  is  indeterminate  on  the  set  x^  =  a-^, 
x2  *=  a2,  ...,  xn  =  an,  if  with  the  existence  of  the  least  root  y  =  £ 
of  the  equation  g(a^,  a 2,  ...,  an,  y)  =  0  for  at  least  one  integral 
nonnegative  value  of  y  =  y  which  satisfies  the  re3ation  0  <  y  <  p  -  1, 
the  function  g(a1,  a2,  ...,  an,  y)  is  indeterminate. 
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The  arithmetic  functions  which  can  be  constructed  from  the  elemen¬ 
tary  arithmetic  functions  with  the  aid  of  the  operations  of  superposi¬ 
tion,  primitive  recursion  and  least  root  are  termed  partial  recursive 
functions.  If  these  functions  are  in  addition  everywhere  determinate, 
then  they  are  termed  general  recursive  functions. 

In  this  definition.  Just  as  in  the  definition  of  the  primitive  re¬ 
cursive  functions,  provision  is  made  for  the  possibility  of  performing 
all  admissible  operations  in  any  sequence  and  any  finite  number  of 
times.  There  exists,  however,  the  result  of  Kleene  [4l]  whic,  makes  it 
possible  to  obtain  any  partially  recursive  function  from  two  primitive 
recursive  functions  with  the  use  of  sequential  application  to  them  of 
a  single  least  root  operation  and  a  single  superposition  operation. 

This  result  can  be  formulated  more  exactly  as: 

for  any  partial  recursive  function  f  (x1,  . . . ,  xn)  there  exist  two 
primitive  recursive  functions  gfx^  ...,  x^, y)  and  h(x)  such  that  the 
function  f(x^,  ...,  x^)  can  be  obtained  from  them  in  the  form  ffx^ 

...,  xn)  «=  h  (^ty[g(x1,  ...,  x^,  y)  «  0]),  where  is  the  least  root 

operator.  Here  the  function  h(x)  can  be  chosen  once  and  for  all,  re¬ 
gardless  of  the  choice  of  f. 

The  partial  recursive  functions  are  the  most  common  class  of  con¬ 
structively  definable  arithmetic  functions.  They  include,  in  particu¬ 
lar,  all  the  arithmetic  functions  which  can  be  given  in  the  form  of 
finite  recursive  schemes  of  arbitrary  form.  By  finite  recursive  scheme, 
here  we  understand  any  finite  system  of  equalities  r  =  s,  where  r  and 
s  are  any  finite  (containing  a  finite  number  of  symbols)  expressions 
constructed  from  the  known  primitive  recursive  functions  of  unknown 
functions  with  numerical  and  literal  arguments,  where  the  values  of 
the  unknown  functions  for  any  given  values  of  the  arguments  must  be  de¬ 
termined  uniquely  after  a  finite  number  of  steps  (depending  on  the 
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selection  of  the  values  of  the  arguments)  as  a  result  of  the  applica¬ 
tion  of  two  rules.  The  first  rule  (substitution  rule)  consists  in  the 
substitution  into  some  one  of  the  given  equalities  in  place  of  one  of 
the  arguments  some  one  of  its  numerical  values.  The  second  rule  (re¬ 
placement  rule)  makes  it  possible  to  use  an  equality  of  the  form  x  ** 

<=  f(x^,  Xg,  . ..,  xn),  where  x,  x1,  Xg,  . . . ,  x^  are  numbers  for  the  re¬ 
placement  by  the  quantity  x  of  some  occurrence  of  the  quantity  f(x^, 
Xg,  . ..,  xn)  in  one  of  the  equalities  r  =  s. 

It  is  found  that  all  the  general  recursive  functions  and  only 
such  functions  can  be  represented  in  this  manner.  This  situation  makes 
it  possible,  following  Erbran  and  Godel,  to  define  the  general  recur¬ 
sive  functions  as  functions  represented  by  the  finite  recursive 
schemes  of  the  form  described  above. 

If,  retaining  the  condition  of  single -valuedness,  we  do  not  re¬ 
quire  the  definability  of  the  values  of  the  functions  appearing  In  the 
scheme  for  all  values  of  the  arguments,  we  can  represent  the  partial 
recursive  functions  by  similar  schemes.  It  Is  of  essence  that  no  recur' 
slve  definitions  (using  finite  schemes)  make  it  possible  to  go  beyond 
the  limit  of  the  class  of  partial  recursive  functions. 

After  accomplishing  the  numeration  of  the  input  and  output  words, 
any  normal  algorithm  can  be  realized  in  the  form  of  a  partial  recur¬ 
sive  function.  Conversely,  any  algorithm  which  is  realizable  with  the 
aid  of  the  partial  recursive  function  is  equivalent  to  some  normal  al¬ 
gorithm.  Thus,  we  can  draw  the  following  important  conclusion. 

An  algorithm  is  normalizable  when  and  only  when  it  can  be  real¬ 
ized  with  the  aid  of  the  partial  recursive  function. 

This  proposition  shows  that  even  on  the  basis  of  the  arithmetic 
(numerical)  approach  to  the  theory  of  algorithms  there  Is  no  departure 
from  the  class  of  the  normalizable  algorithms. 
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Let  us  consider  two  other  approaches  to  the  theory  of  algorithms 
proposed  in  1936  by  Post  [63]  and  Turing  [ 73]  • 

In  the  algorithmic  system  proposed  by  Post,  the  input  and  output 
information  is  represented  in  the  standard  binary  form,  while  the  al¬ 
gorithm  is  in  the  form  of  a  finite  ordered  set  of  rules  termed  orders. 
For  the  writing  of  the  input,  output  and  intermediate  information  use 
is  made  of  a  hypothetical  endless  information  tape  which  is  divided  in¬ 
to  individual  cells,  in  each  of  which  there  can  be  located  only  a  sin¬ 
gle  letter  (digit  0  or  l).  Those  cells  in  which  ones  are  written  are 
termed  signed  and  those  in  which  zeros  are  written  are  termed  unsigned. 
At  any  instant  of  operation  of  the  algorithm  only  a  finite  number  of 
cells  can  be  signed. 

The  operation  of  the  algorithm  is  accomplished  in  discrete  steps, 
in  each  of  which  there  is  performed  one  of  the  orders  which  constitute 
the  algorithm.  To  each  step  there  corresponds  a  definite  active  cell 
on  the  information  tape.  Some  initial  cell  is  fixed  as  the  active  cell 
for  the  first  order.  Further  changes  of  the  location  of  the  active 
cell  on  the  tape  must  be  provided  for  in  the  algorithm  itself.  The  or¬ 
ders  which  constitute  the  algorithm  can  belong  to  one  of  the  following 
six  types. 

First  type.  Flag  the  active  cell  of  the  tape  (write  one  in  it) 
and  go  to  the  performance  of  the  1-th  order  (i  can  be  any  number  from 
the  numbers  used  for  the  numeration  of  the  orders  of  the  algorithm). 

Second  type.  Erase  the  flag  of  the  active  cell  (write  zero  in  It) 
and  go  to  the  performance  of  the  1-th  order. 

Third  type.  Shift  the  active  cell  one  step  to  the  right  and  go  to 
the  performance  of  the  1-^h  order. 

Fourth  type.  Shift  the  active  cell  one  step  to  the  left  and  go  to 
the  performance  of  the  1-th  order. 
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Fifth  type.  If  the  active  cell  is  signed  (one  is  written  there), 
then  go  to  the  performance  of  the  J-th  order,  and  if  the  active  cell 
is  not  signed  (zero  written  there),  then  go  to  the  performance  of  the 
1-th  order. 

Sixth  type.  Stop,  termination  of  operation  of  the  algorithm. 

Algorithms  composed  of  any  finite  number  of  rules  of  the  type  des¬ 
cribed  are  called  Post  algorithms.  It  has  been  shown  that  the  Post  al¬ 
gorithms  reduce  to  the  algorithms  realizable  with  the  aid  of  the  par¬ 
tial  recursive  function,  and,  conversely,  any  partial  recursive  func¬ 
tion  can  be  represented  by  an  algorithm  of  the  Post  system.  Thus,  we 
can  formulate  the  following  proposition. 

The  class  of  all  algorithms  equivalent  to  the  Post  algorithms  co¬ 
incides  with  the  class  of  all  normalizable  algorithms. 

The  algorithmic  scheme  proposed  simultaneously  by  Post  and  Turing 
[73]  is  quite  close  to  the  scheme  just  described.  In  the  Turing  scheme, 
which  is  customarily  termed  the  Turing  machine,  the  information  is  al¬ 
so  recorded  on  a  bilaterally  infinite  information  tape  which  is  divid¬ 
ed  into  individual  cells.  However,  in  contrast  with  the  Post  algorithm, 
here  an  arbitrary  finite  alphabet  is  required  for  the  writing  of  the 
information.  Each  cell  of  the  information  tape  serves  for  the  writing 
of  a  single  letter.  This  letter  can  be  surveyed  by  a  sensitive  element, 
the  so-called  head  of  the  Turing  machine,  which  is  capable  of  displace¬ 
ment  along  the  information  tape  in  both  directions.  The  head  of  the 
Turing  machine  can  be  in  a  finite  number  of  different  states  q-^,  q2, 

. ..,  can  print  in  the  surveyed  cell  any  letter  Xp  x2,  and 

can  shift  to  the  right  or  left  along  the  information  tape  by  one  cell. 

The  writing  of  the  algorithm  realized  by  the  Turing  machine  is  ac¬ 
complished  with  the  aid  of  the  operating  program  of  this  machine, 
which  is  a  set  of  five  symbols  of  the  form  xj_Q  The 
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written-out  group  of  five  symbols  designates  that  $he  Turing  machine 


head  which  is  in  the  state  q^  and  senses  the  letter  x ^  recorded  on  the 
tape  will  print  in  place  of  this  letter  the  new  letter  x^  (which  can 
in  a  particular  case  coincide  with  the  previously  recorded  letter  x1), 
transfers  to  the  new  state  qp  (which  also  can  coincide  with  the  previ¬ 
ous  state)  and  makes  a  shift  along  the  tape  of  the  magnitude  s  ,  equal 

Mr 

to  ±1. 

The  original  scheme  of  the  Turing  machine  was  intended  for  the 
writing  out  o*  the  values  taken  by  an  arbitrary  single-place  partial 
recursive  function  with  values  of  the  argument  equal  to  0,  1,  2,  . ... 

In  this  case,  of  course,  the  Turing  machine  must  operate  infinitely 
long.  We  can  construct  a  Turing  machine  which  computes  the  values  of 
any  a  priori  given  partial  recursive  function.  It  is  advisable,  how¬ 
ever,  to  modify  the  original  scheme  of  the  Turing  machine  described  a- 
bove.  Let  us  assume  that  the  last  symbol  Sp  of  the  group  of  five  sym¬ 
bols  describing  the  operation  of  the  Turing  machine  can  take.  In  addi¬ 
tion  to  the  values  ±1  introduced  above,  a  third  value — "stop  machine". 
With  this  addition  the  Turing  machine  is  converted  Into  an  ordinary  al¬ 
gorithmic  system.  It  either  processes  the  input  word  jo  initially  writ¬ 
ten  on  the  tape  infinitely  long  or  after  a  finite  number  of  transforma¬ 
tion  steps  it  stops.  In  the  first  case  it  is  presumed,  as  usual,  that 
the  algorithm  realized  by  the  machine  is  not  applicable  to  the  input 
word  jo.  In  the  second  case  the  information  remaining  on  the  tape  at 
the  instant  the  machine  stops  is  taken  as  the  output  word  into  which 
the  machines  transforms  the  given  input  word  jo.  In  this  case,  of 
course,  it  is  necessary  to  have  in  the  alphabet  used  for  the  recording 
of  the  information  on  the  tape  a  special  empty  word  to  designate  those 
cells  in  which  no  information  is  written. 
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We  can  show  that  all  algorithms  which  are  realizable  with  the  aid 
of  the  described  modifications  of  the  Turing  machines  are  normalizable 
and,  conversely,  any  normalizable  algorithm  can  be  realized  with  the 
aid  of  a  Turing  machine  specially  constructed  for  this  purpose.  Making 
use  of  the  sriting  of  the  programs  of  operation  of  the  Turing  machines 
and  of  their  input  words  in  some  standard  alphabet,  we  can  construct 
a  universal  Turing  machine  by  exactly  the  same  method  used  in  con¬ 
structing  the  universal  normal  algorithm  (§2).  Giving  the  universal 
Turing  machine  the  representation  of  the  program  of  any  given  Turing 
machine  M  and  the  representation  of  any  input  word  jd,  we  obtain  the 
representation  of  the  output  word  £  into  which  the  machine  M  trans¬ 
forms  the  input  word  jd.  If,  though,  the  algorithm  realized  by  the  mach¬ 
ine  M  is  not  applicable  to  the  word  j)  (the  machine  M  works  infinitely 
long  on  its  transformation),  then  the  algorithm  realized  by  the  univer¬ 
sal  Turing  machine  also  is  not  applicable  to  the  word  formed  from  the 
representation  of  the  word  jc  and  the  program  of  the  maching  M. 

Thus,  in  spite  of  the  considerable  qualitative  difference,  all 
the  described  algorithmic  systems  lead,  in  essence  (with  an  accuracy 
to  equivalency),  to  the  same  class  of  algorithms.  This  conclusion  is 
still  another  confirmation  that  the  modern  theory  of  algorithms  em¬ 
braces  an  extremely  broad  class  (if  not  all)  of  constructively  definab¬ 
le  alphabetic  operators. 

§5.  THE  CONCEPT  OP  ALGORITHMICALLY  INSOLUBLE  PROBLEMS 

Every  algorithm  is  the  method  of  solution  of  some  mass  problem 
which  can  be  formulated  in  the  form  of  the  processing  not  of  one,  but 
an  entire  set  of  input  words  into  the  corresponding  output  words. 

Since  both  the  condition  and  the  solution  of  any  problem  can  be  ex¬ 
pressed  in  the  form  of  individual  words,  every  algorithm  can  be  consid¬ 
ered  as  a  universal  method  for  the  solution  of  an  entire  class  of  prob¬ 
lems.  -  57  - 


I 


A  detailed  analysis  shows  that  there  also  exist  those  classes  of 
problems  for  whose  solution  there  is  not  and  can  not  be  a  single  uni¬ 
versal  technique.  The  problems  of  the  solution  of  this  kind  of  problem 
are  termed  algorithmicly  Insoluble  problems.  However  the  algorithmic 
insolubility  of  the  problem  of  the  solution  of  problems  of  a  particu¬ 
lar  class  does*  not  at  all  indicate  the  impossibility  of  the  solution 
of  any  specific  problem  of  this  class.  The  question  concerns  the  impos¬ 
sibility  of  the  solution  of  all  problems  of  the  given  class  by  the 
same  technique. 

For  a  better  understanding  of  the  problem  of  the  algorithmic  in¬ 
solubility  we  shall  present  examples  of  algorithmicly  soluble  and  algo¬ 
rithm!  cly  insoluble  problems. 

A  typical  example  of  an  algorithmicly  soluble  problem  is  that  of 
the  proof  of  identities  in  ordinary  algebra.  For  simplicity  we  shall 
limit  ourselves  to  the  cases  when  the  identities  are  constructed  from 
rational  numbers  and  letters  (designated  variables)  with  the  aid  of 
the  addition,  subtraction  and  multiplication  operations.  The  following 
general  technique  for  the  solution  of  this  problem  is  well  dnown  from 
the  school  algebra  course:  using  the  distributive  way  for  multiplica¬ 
tion,  we  remove  the  parentheses  in  the  right  and  left  sides  of  any  giv¬ 
en  identity  and  perform  the  reduction  of  like  terms  in  accordance  with 
well  known  rules.  After  accomplishment  of  all  these  transformations, 
both  the  left  and  right  sides  of  the  original  identity  are  transformed 
into  polynomials.  The  identity  will  be  valid  when  any  only  when  these 
polynomials  identically  coincide  with  one  another.  In  other  words,  the 
validity  of  the  identity  means  that  after  the  transfer  of  all  the 
terms  of  the  transformed  identity  into  one  side  these  terms  mutually 
cancel,  the  result  being  the  conversion  of  the  identity  into  the  trivi¬ 
al  identity  0*0. 
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Thus,  the  identity  problem  in  elementary  algebra  is  algorithm! cly 
solvable--there  exists  a  single  constructive  technique  which  makes  it 
possible  after  a  finite  number  of  steps  to  decide  whether  any  fiven 
lation  is  an  identity.  We  can,  however,  construct  examples  of  such  al¬ 
gebraic  systems  in  which  the  identity  problem  is  an  algorithmicly  in¬ 
soluble  problem.  As  such  algebraic  systems  we  might  select,  for  exam¬ 
ple,  the  semigroups  or  groups  given  by  systems  of  generating  elements 
and  defining  relations.  Examples  of  semigroups  with  insoluble  Identity 
problem  were  first  found  by  Post  [64]  and  corresponding  examples  for 
groups  were  found  by  Novikov  [60]. 

Without  writing  out  the  defining  relations  explicitly,  we  shall 
clarify  the  essence  of  these  examples.  Let  x1,  Xg,  ...,  xn  be  letters 
of  some  finite  alphabet.  The  set  of  all  words  in  this  alphabet,  includ¬ 
ing  the  empty  word  e,  is  termed  a  free  semigroup  with  the  generating 
elements  x1,  xg,  ...,  xn,  if  for  the  arbitrary  pairs  of  words  p,  q 
there  is  introduced  the  multiplication  operation  amounting  simply  to 
the  suffixing  of  one  word  to  the  other.  We  agree  to  designate  the  free 
semigroup  with  generating  elements  x1#  Xg,  ...,  xn  by  P(x1,  x2,  ..., 

xn),  and  the  result  of  multiplying  the  word  jc  by  the  word  c[  we  desig- 

* 

nate  by  pq. 

In  the  free  semigroup  we  can  introduce  any  3et  of  defining  rela¬ 
tions  ,  which  are  formal  equalities  between  two  nonidentical  words: 

Pi  e  (i  =  l,  2,  ...).  Two  words  in  the  free  semigroup  P  with  the 
given  system  S  of  defining  relations  are  termed  identical,  or  mutually 
equivalent,  if  one  of  them  can  be  obtained  from  the  other  by  an  arbi¬ 
trary  number  of  substitutions  into  the  second  word  of  the  right  sides 
of  the  defining  relations  in  place  of  the  left  and,  conversely,  the 
left  in  place  of  the  right.  For  example,  in  the  semigroup  with  the  sys¬ 
tem  of  generators  (x,  y)  and  one  defining  relation  xy  =  yx  the  words 
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p  ■  xxy  and  q  ■  yxx  are  mutually  identical  since  the  first  word  can  be 

obtained  from  the  second  as  the  result  of  two  substitutions  of  the 

\ 

form  described  above:  q  ■  yxx  xyx  -*>  xxy  ■  p.  With  the  reverse  substi¬ 
tution,  the  chain  of  substitutions  written  above  can  be  read  in  the  re¬ 
verse  direction,  which  makes  possible  not  only  the  transformation  of 
.the  word  3  into  the  word  2  but  als0  of  the  word  £  into  the  word  3* 

The  identity  problem  of  words  for  the  semigroups  is  formulated  as 
follows. 

Assume  that  in  the  arbitrary  free  semigroup  F  with  a  finite  num¬ 
ber  of  generators  there  is  given  any  system  of  defining  relations  S 
consisting  of  a  finite  number  of  relations.  We  are  required  to  find 
the  single  constructive  technique  which  makes  it  possible  after  a  fi¬ 
nite  number  of  steps  to  decide  whether  any  two  given  words  of  the  semi¬ 
group  F  with  the  system  of  defining  relations  S  are  identical  or  non¬ 
identical. 

For  some  systems  of  defining  relations  the  problem  formulated  is 
solvable;  however,  as  Post  [64]  has  shown,  there  also  exist  such  sys¬ 
tems  of  defining  relations  for  which  the  problem  of  the  identity  of 
the  words  is  algorlthmlcly  insoluble.  ThiB  does  not  mean,  of  course, 
impossibility  of  establishing  the  identity  or  nonidentity  of  any  fixed 
specific  pair  of  words.  There  does  not  exist  a  single  technique  for 
the  establishment  of  the  identity  of  any  pair  of  words,  similar  to  the 
technique  described  above  for  the  proof  of  the  validity  or  nonvalidity 
of  any  relation  in  elementary  algebra. 

The  problem  of  word  identity  for  groups  in  its  basic  features  co¬ 
incides  with  the  corresponding  problem  for  the  semigroups.  The  free 
group  Q  with  the  generating  elements  x^,  Xg,  ...,  xn  la  constructed  as 

the  ensemble  of  words  composed  from  the  letters  x^,  Xg,  ...,  xn  and 

—1  -I  —1 

the  "inverse”  letters  x1  ,  Xg  ,  ...,  .In  this  case  two  mutually 
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inverse  letters  standing  side  by  side  cancel  one  another  (become  equiv¬ 
alent  to  an  empty  word) 

-  *r'xi  -  *•  (8) 

In  the  determination  of  the  identity  of  two  words  in  a  group  with 
the  system  of  defining  relations  S,  we  must  take  account  not  only  of 
the  relations  appearing  in  this  system  but  also  the  relations  of  the 
form  (8).  Just  as  for  the  semigroups,  the  word  identity  problem  for 
groups  which  are  specified  by  a  finite  number  of  generating  and  defin¬ 
ing  relations  is  algorithmicly  insoluble  in  the  general  case.  Examples 
of  groups  with  insoluble  word  identity  problem  were  first  constructed 
by  Novikov  [60]. 

How  can  the  algorithmic  insolubility  of  a  particular  problem  be 
proved?  The  classical  example  of  such  an  insoluble  problem  is  the  prob¬ 
lem  of  the  recognition  of  the  selfapplicability  of  algorithms.  For 
the  exact  formulation  of  this  problem  we  shall  treat  only  normal  algo¬ 
rithms  in  alphabets  consisting  of  no  less  than  two  letters.  With  this 
assumption  we  can,  without  losing  generality,  stipulate  that  some  let- 
lers  of  the  alphabet  of  any  algorithm  with  which  we  will  be  concerned 
will  be  identified  with  the  two  letters  (0  and  l)  of  the  standard  bina¬ 
ry  alphabet.  From  the  assumed  condition,  for  any  algorithm  A  consid- 

N 
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ered,  its  representation  A  in  the  standard  binary  alphabet  can  be  con¬ 
sidered  as  the  input  word  of  this  algorithm.  If  the  word  Au  appears  in 
the  domain  of  definition  of  the  algorithm  A,  then  the  algorithm  is 
termed  self applicable,  otherwise  it  is  termed  nonself applicable. 

Both  self applicable  and  nonselfapplicable  algorithms  exist.  An 
example  of  the  selfapplicable  (normal)  algorithm  is  the  so-called 
d entity  algorithm  in  any  alphabet  which  contains  two  or  more  than 
two  letters.  By  definition  this  algorithm  is  applicable  to  any  word  £ 
in  the  alphabet  i  and  transforms  any  input  word  into  itself.  An 
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example  of  the  nonselfapplicable  algorithm  is  the  so-called  zero  al¬ 
gorithm  in  any  finite  alphabet  0.  This  algorithm  is  given  by  a  scheme 
containing  the  identity  substitution  -♦  y  (where  ^  is  any  letter  of  the 
alphabet  0).  By  its  very  definition  it  is  not  applicable  to  any  input 
word,  and  this  means  that  it  is  not  applicable  ot  its  own  representa¬ 
tion. 

The  problem  of  the  identification  of  the  self applicability  of 

the  algorithms  amounts  to  finding  a  single  constructive  technique 

which  makes  it  possible,  after  a  finite  number  of  steps  using  the 

scheme  of  any  given  algorithm  A  in  some  fixed  algorithmic  system  (for 

1 

example,  in  the  system  of  normal  algorithms),  to  recognize  whether  the 
algorithm  A  is  self applicable  or  not. 

e 

If  we  consider  that  the  normalization  principle  formulated  in  §2 
is  valid,  we  can  assume  that  the  single  constructive  technique  in  ques¬ 
tion  is  none  other  than  the  normal  algorithm  B,  defined  on  any  word 
which  is  the  representation  of  the  arbitrary  normal  algorithm  A  and 
which  transforms  this  word  into  two  different  fixed  words  q1  and  qg  de¬ 
pending  on  whether  the  algorithm  A  is  self  applicable  or  not(the  word 
is  the  code  of  the  word  "  self  applicable"  and  q2  is  the  code  of  the 
word  "nonselfapplicable" ). 

On  any  input  word  1  which  is  not  the  representation  of  any  (nor- 

“  i 

mal)  algorithm,  the  algorithm  B  also  must  be  defined.  Actually,  other¬ 
wise,  not  obtaining  any  result  after  some  number  (sufficiently  large) 
of  steps  of  operation  of  the  algorithm,  we  would  not  know  whether  the 
word  1  is  the  representation  of  a  selfapp lie able  or  nonselfapplica¬ 
ble  algorithm.  It  is  clear  also  that  the  result  of  the  application  of 
the  algorithm  B  to  any  word  which  is  not  the  representation  of  an  algo¬ 
rithm  must  be  different  from  the  word  q1  and  also  from  the  word  q^. 

Let  us  assume  that  the  algorithm  B  with  the  indicated  properties 
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exists.  In  this  case  there  exists  the  normal  algorithm  C  in  the  same 
alphabet  X,  as  the  algorithm  B,  defined  on  all  those  and  only  those 
words  in  the  alphabet  X,  which  are  the  representations  of  nonself- 
applicable  algorithms  (we  recall  that  from  the  definition  itself  of 
the  algorithm  B,  the  alphabet  X  includes  in  itself  the  standard  binary 
alphabet). 

Actually,  let  us  construct  the  normal  algorithm  D  in  the  alphabet 
X,  whose  domain  of  definition  consists  of  only  the  single  word  q2.  Such 
an  algorithm  can  be  given,  for  example,  in  the  form  (normalized)  of 
the  superposition  of  two  normal  algorithms  D1  and  D2,  the  first  of 
which  Is  given  by  a  scheme  consisting  of  the  single  substitution  q2  -►  • , 
while  the  second  is  given  by  a  scheme  consisting  of  substitutions  of 
the  form  x^  -»•  x^,  where  x ^  runs  through  all  the  letters  of  the  alpha¬ 
bet  X.  It  is  clear  that  the  first  algorithm  transforms  into  an  empty 
word  only  the  word  q2,  while  the  domain  of  definition  of  the  second  al¬ 
gorithm  consists  only  of  an  empty  word.  Therefore  the  domain  of  defini¬ 
tion  of  the  superposition  D  of  the  algorithms  D1  and  D2  will  consist 
only  of  the  word  q2,  which  we  require. 

After  constructing  the  algorithm  D,  forming  the  superposition  of 
it  with  the  algorithm  B,  and  normalizing  this  superposition,  we  arrive 
at  the  normal  algorithm  C  in  the  alphabet  X,  whose  domain  of  definition 
consists  of  all  those  and  only  those  words  in  the  alphabet  X  which  are 
forms  of  nonselfappli cable  algorithms.  However,  this  property  of  the 
algorithm  C  is  intrinsically  contradictory,  since  the  algorithm  C  can¬ 
not  be  either  applicable  or  nonapplicable  to  its  own  representation  Cu. 

Actually,  in  the  first  case  the  algorithm  C  would  be  applicable 
to  its  representation  and  therefore  would  be  self applicable.  But  this 
would  contradict  the  fact  that  as  a  result  of  its  construction  the  al¬ 
gorithm  C  must  be  applicable  only  to  the  nonselfapplicable  algorithms. 


In  the  second  case,  being  nonapplicable  to  its  representation,  the  al¬ 
gorithm  C  would  belong  to  the  number  of  the  nonself applicable  algo¬ 
rithms.  But  then,  by  definition  the  algorithm  C  would  have  to  be  appli¬ 
cable  to  its  representation,  since  it  is  applicable  to  the  representa¬ 
tion  of  all  nonselfapplicable  algorithms.  Consequently,  the  algorithm 
C  is  selfapplicable. 

Thus,  the  assumption  on  the  algorithmic  solvability  of  the  prob¬ 
lem  of  the  recognition  or  selfapplicability  leads  to  a  logical  contra¬ 
diction  and  therefore  is  not  valid,  which  proves  the  algorithmic  unde¬ 
cidability  of  this  problem. 

We  have  substantiated  this  conclusion  only  for  the  condition  that 
the  algorithm  normalization  principle  is  valid.  However,  the  nature  of 
the  contradiction  used  for  the  prbof  of  the  algorithmic  insolvability 
of  the  problem  of  the  recognition  of  the  self applicability  of  algo¬ 
rithms  is  in  actuality  more  profound.  The  reader  who  is  familiar  with 
the  paradoxes  of  the  theory  of  sets  and  of  mathematical  logic  will  eas¬ 
ily  note  that  this  contradiction  has  the  same  nature  as  the  contradic¬ 
tion  in  the  known  paradox  of  Russel  which  establishes  the  intrinsic 
contradiction  of  the  concept  of  a  "set  of  all  sets  not  containing  it¬ 
self  as  an  element. " 

This  circumstance  leads  to  the  conclusion  that  the  algorithmic  un¬ 
decidability  of  the  problem  of  the  recognition  of  self applicability  is 
not  a  result  of  the  narrowness  of  the  modern  exact  concept  of  the  algo¬ 
rithm.  If  we  were  able  to  construct  an  exact  concept  of  the  algorithm 
which  includes  certain  nonnormalizable  algorithms,  then  the  problem  of 
the  recognition  of  the  self applicability  of  the  algorithms  would  re¬ 
main  as  before  algorithmicly  undecidable. 

Prom  the  algorithmic  undecidability  of  the  problem  of  the  recogni¬ 
tion  of  the  self applicability  of  the  algorithms,  the  algorithmic 
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undecidability  of  a  whole  series  of  other  problems  is  developed.  The 
general  method  for  these  derivations  amounts  to  the  derivation  from 
the  assumption  on  the  existence  of  the  algorithm  which  solves  a  partic 
ular  problem  Q  of  the  existence  of  the  algorithm  which  solves  the  prob 

'i 

lem  of  the  recognition  of  the  selfapplicability  of  the  algorithms. 
Since  the  latter  is  impossible,  the  existence  of  the  algorithm  which 
solves  the  problem  Q  also  is  impossible. 

Using  the  genneral  method,  the  algorithmic  undecidability  of  a 
set  of  different  problems  has  been  proved,  including  the  general  prob¬ 
lems  of  the  identity  of  words  for  groups  and  semigroups  considered  a- 
bove.  We  shall  mention  some  other  algorithmicly  undecidable  problems 
whose  undecidability  has  been  established  by  this  same  method.  One 
problem  is  that  of  the  recognition  of  the  applicability  of  some  algo¬ 
rithm  to  a  particular  word.  There  can  be  constructed  an  algorithm  A, 
operating  in  some  alphabet  X,  for  which  there  does  not  exist  an  algo¬ 
rithm  in  the  alphabets  ,  and  in  any  expansion  of  it,  which  transforms 
into  some  fixed  word  those  and  only  those  words  to  which  the  algorithm 
A  is  not  applicable. 

The  problem  of  the  construction  of  an  algorithm  which  transforms 
into  the  fixed  word  £  all  the  words  to  which  any  given  algorithm  A  is 
applicable  is,  as  it  is  not  difficult  to  see,  algorithmicly  undecida¬ 
ble  ;  for  its  solution  it  is  sufficient  to  construct  the  algorithm  B 

which  transforms  into  the  word  £  all  words  in  the  alphabet  of  the  algo 

\ 

rithm  A  and  to  form  the  superposition  of  the  algorithms  A  and  B.  We 
stipulate  that  an  algorithm  annuls  particular  words  it  it  transforms 
them  into  the  empty  word  e.  The  problem  Of  the  recognition  of  annul¬ 
ment  for  any  given  algorithm  A  consists  in  the  construction  of  the  al¬ 
gorithm  B  (in  the  same  alphabet  as  A)  which  annuls  all  those  and  only 
those  words  which  algorithm  A  does  not  annul.  This  problem  in  the 
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general  case  is  algorithmicly  undecidable,  namely:  we  can  select  the 
algorithm  A  so  that  the  algorithm  B  with  the  indicated  properties  can¬ 
not  be  constructed. 

Quite  frequently  in  the  proof  of  the  algorithmic  insolvability  of 
particular  problems  use  is  made  of  the  Post  [64]  proof  of  the  algo¬ 
rithmic  insolvability  of  the  following  problem,  which  has  been  termed 
the  Post  combinatorial  problem.  Assume  that  in  the  arbitrary  finite  al- 
pabet  X  there  are  given  any  finite  systems  S  of  pairs  of  words  (p^# 
q^),  •••»  (Pn*  On)*  We  are  required  to  construct  a  single  constructive 
technique  which  will  make  it  possible  for  any  such  system  S  after  a 
finite  number  of  steps  to  answer  the  question  of  whether  we  can  con¬ 
struct  a  word  p.  p.  . . .  p.  from  the  first  elements  of  the  pairs  of 

11  x2  xk 

the  system  S  such  that  it  will  coincide  with  the  word  q.  q,  . . .  q.  , 

11  12  xk 

constructed  from  the  corresponding  second  elements  of  ^he  same  system 
of  pairs. 

The  problem  of  matrix  representability  is  also  algorithmicly  un- 
solvable.  For  the  formulation  of  this  problem  we  stipulate  that  a  ma¬ 
trix  is  representable  in  terms  of  the  matrix  U^,  U2,  . . . ,  Un  if  for 

some  finite  sequence  (generally  speaking  with  repetitions)  U.  U.  ... 

X1  12 

U.  of  these  matrices  the  product  U.  U.  . . .  U .  of  all  the  matrices 
xk  X1  2  xk 

appearing  in  the  given  sequence  coincide  with  the  given  original  ma¬ 
trix  U.  The  representation  problem  consists  in  finding  the  general  con¬ 
structive  technique  by  which,  after  a  finite  number  of  steps  for  any 

matrix  U  and  any  finite  system  S  of  matrices,  we  would  be  able  to  know 

\ 

whether  the  matrix  U  is  representable  in  terms  of  the  matrices  of  the 
system  S  or  not. 

We  recall  that  the  algorithmic  undecidability  of  all  the  indica¬ 
ted  problems  is  proved  on  the  assumption  of  the  validity  of  the  normal¬ 
ization  principle;  however,  as  noted  above,  the  nature  of  this 
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undecidability  is  more  profound  and,  in  a  certain  sense,  is  independ 
ent  of  this  principle. 
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[Footnotes] 


13  The  word  jg  is  termed  the  initial  segment  of  the  word  q  if 

the  word  a  has  the  form  £  =  or,  where  r  is  any  word  (includ¬ 
ing,  possibly,  an  empty  word). 


Chapter  2 


BOOLEAN  FUNCTIONS  AND  PROPOSITIONAL  CALCULUS 
§1.  CONCEPT  OF  BOOLEAN  FUNCTIONS 

Boolean  (or  switching)  function  is  the  term  customarily  given  to 
those  functions  for  which  .all  the  arguments,  and  the  functions  them¬ 
selves,  can  take  on  only  two  values. 

The  role  of  the  boolean  functions  in  cybernetics  is  determined  by 
two  basic  characteristics.  First,  the  boolean  functions  are  a  conveni¬ 
ent  apparatus  for  the  description  of  the  circuits  of  many  information 
converters  constructed  using  the  discrete  principle,  since  with  cur¬ 
rent  technology  it  is  far  easier  to  construct  discrete  elements  func¬ 
tioning  directly  in  the  binary  alphabet  and  not  in  some  other  alphabet. 
Second,  the  boolean  functions  are  sidely  used  in  mathematical  logic, 
which  is  one  of  the  foundations  on  which  the  automation  of  the  complex 
thought  processes  is  founded. 

The  use,  along  with  the  usual  variables  which  take  on  numerical 
values,  of  the  boolean  variables,  which  have  only  two  possible  values, 

i 

plays  a  significant  role  in  the  design  of  various  kinds  of  practical 
algorithmic  systems  for  programming  on  the  electronic  computers.  The 
boolean  functions  can  also  be  used  successfully  for  the  solution  of 
certain  general  questions  of  the  theory  of  algorithms,  for  example  to 
refine  the  concept  of  algorithmic  complexity.  The  two  possible  values 
of  the  variable  which  figure  in  the  definition  of  the  boolean  func¬ 
tions  can  be  designated  arbitrarily.  In  practice,  however,  two  nota¬ 
tion  systems  are  used  most  frequently.  The  first  (for  use  of  the 
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boolean  functions  In  the  theory  of  automata  circuits)  assigns  to  the 
possible  values  of  the  boolean  variables  the  notations  0  and  1.  We 
shall  term  the  symbols  introduced.  Just  as  in  the  case  of  numerals,  ze¬ 
ro  and  one,  considering  that  here  the  zero  and  one  appear  not  as  numer¬ 
als,  but  only  as  convenient  notations  for  the  letters  of  the  abstract 
binary  alphabet.  In  the  future  we  shall  assign  these  symbols  several 
properties  which  make  it  possible  to  consider  them  (with  one  ^"option) 
as  ordinary  numerals  (this  is  precisely  the  convenience  of  the  nota¬ 
tion  system  being  considered).  But  all  such  properties  must  be  precise- 

* 

ly  defined  before  use.  We  cannot,  in  particular,  yet  make  use  of  the 
properties  of  zero  and  one  which  result  from  the  existence  of  the  oper¬ 
ations  of  addition  and  multiplication  for  numbers,  since  we  have  not 
yet  defined  these  operations  for  these  symbols. 

In  the  second  system  of  notation,  the  words  "true"  and  "false" 
serve  as  the  notations  for  the  two  possible  values  of  the  boolean  vari¬ 
ables.  This  system  of  notation  is  used  in  mathematical  logic,  primari¬ 
ly  in  the  portion  which  is  called  propositional  calculus.  Its  applica¬ 
tion  is  associated  with  the  circumstance  that  in  the  propositional  cal¬ 
culus  the  boolean  variables  are  interpreted  as  the  propositional  vari¬ 
ables,  considered  from  the  point  of  view  of  the  truth  or  falsity  of 
the  proposition. 

In  the  present  and  three  following  sections  we  shall  make  use  of 
the  first  system  of  notation  without  specifying  this  each  time.  When 
it  is  necessary  to  make  a  transition  from  one  system  of  notation  to 
the  other,  we  stipulate  that  one  corresponds  to  true  and  the  zero  cor¬ 
responds  to  false  (we  could,  of  course,  assume  exactly  the  opposite 
correspondence). 

Let  us  consider  the  boolean  functions  of  any  finite  number  of  ar¬ 
guments.  Of  the  number  of  arguments  is  equal  to  n,  then  it  is  customary 
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to  term  the  corresponding  function  n-place.  As  a  result  of  the  fact 
that  each  boolean  variable  can  take  only  two  values,  the  domain  of  def¬ 
inition  of  any  boolean  function  will  of  necessity  be  finite.  It  is  easy 
to  see  that  the  domain  of  definition  of  an  n-place  boolean  function 
can  consist  of  a  maximum  of  2n  different  elements,  which  are  all  possi¬ 
ble  sets  of  values  of  its  n  arguments.  We  will  usually  order  the  argu¬ 
ments  of  a  given  boolean  function  by  assigning  them  the  numbers  1,  2, 

. . .  ,  n.  In  this  case  the  set  of  values  of  the  arguments  is  identified 
with  some  cortege  (finite  ordered  sequence)  of  zeros  and  ones.  For  ex¬ 
ample,  the  set  of  values  x1  ■  1,  Xg  ■  0,  x^  ■  0  of  arguments  of  the 
three-place  boolean  function  f(x1,  x^)  can  be  abbreviated  in  the 

form  of  the  cortege  100,  and  the  set  x.^  *=  0,  x^  «=  0,  x^  =  1  can  be 
written  in  the  form  of  the  cortege  001.  In  the  future  we  frequently 
shall  term  these  corteges  simply  sets  (here  the  arguments  are  always 
numbered  in  a  definite  order — in  the  order  in  which  they  are  encoun¬ 
tered  in  the  notation  f(x1,  x2,  ...,  xR)  corresponding  to  the  boolean 
function).  The  term  boolean  in  application  to  a  cortege  (set)  denotes 
that  the  corresponding  cortege  is  composed  of  zeros  and  ones. 

Each  cortege  of  length  n,  composed  of  zeros  and  ones  (a  boolean 
cortege),  can  be  identified  with  some  vertex  of  an  n-dimensional  unit 
cube  having  the  corresponding  coordinates.  For  the  two-dimensional 
case,  when  the  n-dimensional  cube  reduces  to  a  square,  the  method  of 
identification  of  the  boolean  corteges  with  the  vertices  is  shown  in 
Fig.  4.  As  a  result  of  the  possibility  of  such  identification,  the 
boolean  sets  (corteges)  will  sometimes  be  termed  points. 

In  the  present  chapter  we  shall  limit  our¬ 
selves  (with  the  exception  of  specially  stipulated 
cases)  to  the  consideration  of  only  those  boolean 
functions  whose  domain  of  definition  includes  all 

Fig.  4. 


-  70  - 


sets  of  values  of  Its  arguments.  Thus,  the  n-place  boolean  function 
must  be  defined  at  2n  different  points.  If  we  do  not  exclude  the  case 
when  a  particular  boolean  function  can  be  undefined  on  at  least  one  of 
the  sets,  then  it  is  termed  a  partial  boolean  function.  The  considera¬ 
tion  of  the  partial  boolean  functions  is  useful  for  the  synthesis  of 
the  circuits  of  descrete  automata.  In  the  theoretical  aspect  there  is 
particular  interest  in  the  boolean  functions  which  are  everywhere  de¬ 
fined,  the  more  so  since  in  case  of  necessity  every  partial  boolean 
function  can  be  redefined  (generally  speaking,  by  an  arbitrary  method) 
on  those  sets  on  which  it  was  not  initially  defined.  Therefore,  speak¬ 
ing  of  the  boolean  functions  hereafter  (if  not  stipulated  otherwise), 
we  will  understand  them  to  be  these  everywhere -defined  functions. 

We  remark  also  that  in  the  consideration  of  a  particular  boolean 
function  we  shall  consider  the  number  of  its  arguments  given.  The  ne¬ 
cessity  for  this  stipulation  is  due  to  the  possibility  of  treating  ev¬ 
ery  n-place  function  as  (n  +  l) -place,  (n  +  2) -place,  and  in  general 
as  an  (n  +  k) -place  function  for  any  natural  number  k.  Actually,  for 
example,  the  const  ant -function  (equal  Identically  to  zero  or  one)  can, 
if  desired,  be  considered  as  a  function  of  any  number  of  arguments,  ar¬ 


guments  of  which  it  is  in  actuality,  however,  independent.  Similarly, 
we  can  to  any  function  ffx^  xg,  ... ,  xfi)  add  any  desired  number  of 
new  arguments  x  +1,  ...,  xn+k,  on  which  the  values  of  the  function  actu¬ 
ally  does  not  depend.  For  this  it  is  sufficient  to  assume  that  for  all 


sets  of  values  of  the  variables  x1,  x2,  ...,  xn+k  the  following  equali¬ 
ty  is  valid 


We  shall  term  the  described  operation  of  the  conversion  of  the  n- 
place  function  into  an  (n  +  k) -place  function  the  operation  of  formal 
assignment  of  arguments.  This  operation  is  obviously  applicable  to  any 
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functions  (and  not  only  boolean). 

As  we  noted  above,  there  are  exactly  2n  boolean  sets  (corteges) 
of  length  n.  These  sets  can  be  considered  as  the  representations  of 

certain  whole  numbers  in  the  binary  number  system  such  that  the  set 
a1#  a2,  . is  identified  with  the  binary  representation  of  the 

number  a,* a11"1  +  a0*2n"2  +  ...  a  ,  *2  +  a  (here  the  boolean  values  0 
and  1  are  considered  simply  as  the  usual  numbers  0  and  1).  We  shall 
term  this  number  the  number  of  the  corresponding  set.  The  numbers  of 
the  sets  vary  from  zero  (for  the  set  consisting  only  of  zeros)  to 
2n  —  1  (for  the  set  consisting  only  of  ones).  The  number  of  the  set 

p 

010  will  be  the  number  0»2  +  1*2  +  0  =  2;  the  number  of  the  set  101 

will  be  the  number  1»22  +  0*2+  1  =  5j  etc. 

Arranging  the  sets  in  columns  one  after  the  other  in  the  order  of 
increase  of  their  numbers  and  placing  alongside  each  set  the  value  of 
the  boolean  function  on  this  set,  we  obtain  the  value  table  of  the 
boolean  function.  Since  on  each  set  the  function  can  take  either  of 
two  values  (0  or  l)  regardless  of  its  values  on  the  remaining  sets, 
for  m  sets  we  can  define  exactly  2m  different  (differing  from  one  an¬ 
other  by  their  values  on  at  least  one  set)  boolean  functions.  Keeping 
in  mind  the  total  number  of  sets  for  n  variables  (equal  to  2n)  defined 
above,  we  come  to  the  conclusion  that  the  number  of  different  boolean 
functions  of  n  arguments,  which  we  shall  designate  B(n),  is  determined 
by  the  equation 

B(«)  -2**.  (9) 

With  n  =  1  the  quantity  B(n)  Is  equal  to  4,  and  with  Increase  by 

O 

1  this  quantity  is  squared:  B(n+l)  =  (B(n))  .  Thus,  if  the  number  of 
single -place  boolean  functions  is  equal  in  all  to  4,  then  the  number 
of  different  two-place  (boolean)  functions  will  be  equal  to  16,  three- 
place  to  256,  four-place  to  256  *  6 5*000,  five-place  to  25 6H  ~  4 
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million,  six-place  to  about  1 6  trillion  (16..101  )  and  s o  on.  Tin 
practical  possibilities  of  sorting  all  the  boolean  functions  are  thus 
limited  to  the  three-place  or  at  best  the  four-place  functions. 

Although  every  boolean  function  can  be  given  in  the  form  of  its 
value  table,  in  the  majority  of  cases  of  practical  application  of  the 
theory  of  boolean  functions  this  method  of  specification  is  inconveni¬ 
ent.  Therefore,  one  of  the  primary  tasks  of  our  further  constructions 
will  be  the  development  of  new  and  more  convenient  methods  of  specify¬ 
ing  the  boolean  functions.  In  this  connection,  of  particular  impor¬ 
tance  are  the  boolean  functions  of  one  and  two  arguments,  since,  as 
wl  1  be  shown  later,  with  their  aid  we  can  represent  any  boolean  func¬ 
tions.  Therefore,  we  shall  make  a  more  detailed  study  of  the  single - 
place  and  two-place  boolean  functions. 

Of  the  four  single-place  functions  cp  (x)  which  can  in  general  bo 
constructed,  two  functions  are  the  constants  0  and  1  which  are  not  ex¬ 
plicitly  dependent  on  x.  Still  another  function  simply  repeats  the  val¬ 
ue  of  its  argument  cp(x)  =  x  and  therefore  also  is  net  of  inter?.'.  .  The 
last,  fourth,  function,  for  which  we  introduce  the  special  nota  ’  ji.  x 
or  H  x,  always  has  a  value  which  is  the  opposite  to  that  of  its  ay.u- 
ment:  0  =  1  and  1  =  0.  This  function  is  termed  inversion  or  negation. 
The  expression  x  (and  also  the  expression  ”1  x)  is  read  as  "negation  x" 
or  "not  x*  "  In  the  theory  of  boolean  functions,  and  also  in  the  appli¬ 
cations  of  this  theory  to  the  synthesis  of  automata  circuits,  follow¬ 
ing  tradition,  we  shall  make  use  of  the  notation  x.  In  mathematical 
logic  ,(end  of  the  present  chapter  and  beginning  of  the  sixth  chapter) 
and  also  in  the  practical  aspects  of  the  theory  of  algorithms  (end  of 
the  fifth  chapter)  it  is  for  several  considerations  more  convenient  to 
use  the  notation  ”1  x. 

Of  the  1 6  different  two-place  boolean  functions  f(x,  y)  which  in 
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general  can  be  constructed,  six  functions  reduce  to  functions  of  a 
smaller  number  of  arguments.  These  are,  first,  again  the  two  constant 
functions  (0  and  l),  second,  the  two  functions  which  repeat  the  values 
of  some  argument  (x  or  y),  and,  third,  two  functions  which  are  the  ne¬ 
gations  of  each  of  the  arguments  (x  and  y). 

The  ten  remaining  functions  f(x,  y),  which  actually  depend  on 
both  of  their  arguments,  can  be  divided  into  pairs  such  that  the  sec¬ 
ond  function  of  the  pair  is  the  negation  of  the  first  function  (l.e. , 
it  has  on  each  set  a  value  which  is  the  opposite  of  the  value  of  the 
first  function).  In  this  case  use  is  actually  made  of  the  single-place 
boolean  function  x  for  the  construction  of  the  single -place  negation 
operation  on  the  set  of  all  boolean  functions.  The  application  of  the 
negation  operation  to  any  boolean  function  £  can  be  treated  as  the  sub' 
stltutlon  of  the  function  in  place  of  the  argument  of  x  into  the  func¬ 
tion  x.  Such  a  substitution  of  some  boolean  functions  in  place  of  the 
arguments  of  other  boolean  functions  (termed  superposition  of  these 
functions)  will  be  widely  used  hereinafter  for  the  formation  of  vari¬ 
ous  operations  on  the  set  of  boolean  functions  (boolean  operations). 

For  the  designation  of  the  operations  thus  constructs,  wc  usually 
make  use  of  the  notation  of  the  boolean  functions  which  generated 
these  operations.  In  our  case  g  (or  ~1  g)  will  serve  for  the  notation 
for  the  negation  of  the  arbitrary  boolean  function  g. 

The  separati on  described  above  of  the  two-place  boolean  functions 
into  pairs  (g,  g)  makes  it  possible  to  actually  limit  ourselves  to  the 
description  of  only  five  functions,  which  we  select  as  the  first  ele¬ 
ments  of  the  pairs  indicated. 

Let  us  begin  the  description  with  conjunction,  also  termed  (logi¬ 
cal)  produce,  or  the  logical  AND  operation.  In  mathematical  logic  it 
is  customary  to  designate  the  conjunction  of  the  variables  x  and  y  by 
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x  &  y  or  x  Ay  (we  shall  use  the  second  of  these  notations).  By  defini¬ 
tion,  the  conjunction  x  A  y  is  equal  to  one  when,  and  only  when,  both 
of  its  arguments  x  and  %  are  equal  to  one. 

For  the  conjunction  x  A  y  to  be  equal  to  zero  it  is  sufficient 
that  at  least  one  of  its  arguments  (x  or  y)  become  zero.  These  proper¬ 
ties  of  the  conjunction  are  completely  analogous  to  all  the  properties 
which  the  product  xy  would  have  if  the  cofactors  composing  it  could 
take  on  only  two  numerical  values — 0  and  1.  This  circumstance  suggests 
considering  the  boolean  constants  (0  and  l)  as  sort  of  "pseudo-numbers" 
for  which  the  multiplication  operation  is  defined  which  possesses  all 
the  properties  of  the  usual  multiplication  operation  for  the  numbers 
0  and  1: 

0-0  =0,  01  =0,  1-0  =0,  11=1. 

In  the  theory  of  boolean  functions  and  in  its  applications  to  the 
theory  of  automata,  it  is  convenient  to  take  precisely  this  point  of 
view.  Moreover,  in  these  cases  we  shall  simply  identify  the  conjunc¬ 
tion  operation  with  multiplication,  both  in  name  and  in  form  of  repre¬ 
sentation.  In  other  words,  in  place  of  the  notation  x  f\  y  we  shall  use 
the  notation  x*y,  or  xy,  and  also  shall  make  use  of  the  terms  "prod¬ 
uct  ,"  cofactor"  and  all  the  properties  of  multiplication  from  conven¬ 
tional  elementary  algebra.  It  Is  easy  to  understand  that,  as  a  result 
of  the  coincidence  of  the  definitions,  multiplication  in  our  case  will 
have  all  the  general  (satisfied  identically)  properties  of  multiplica¬ 
tion  in  conventional  algebra  (commutativity,  associativity,  and  so  on). 
At  the  same  time,  the  limitation  on  the  set  of  possible  values  of  the 
quantities  leads  to  the  appearance  for  the  logical  multiplication 
which  we  are  considering  of  some  properties  which  conventional  multi¬ 
plication  does  not  have.  For  example,  in  the  case  of  logical  multipli¬ 
cation  the  identity  relation  x»x  =  x  becomes  invalid  if  in  place  of 
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the  values  0  and  1  we  substitute  into  this  relation  other  numerical 
values  of  the  quantity  x. 

Just  as  in  the  case  of  negation,  multiplication  (conjunction)  can 
be  considered  not  only  as  a  function,  but  also  as  an  operation  on  the 
set  of  all  boolean  function.  For  this  purpose  it  is  sufficient  in 
place  of  the  independent  variables  x  and  %  to  substitute  in  the  pro¬ 
duct  xy  two  arbitrary  boolean  functions  f  and  jg:  p  =  fg.  Similarly,  any 
other  two-place  boolean  function  b(x,  y)  defines  a  two-place,  or,  as 
it  is  usually  customary  to  say  in  algebra,  binary  operation  on  the  set 
of  all  boolean  functions,  which  we  shall  term  and  designate  just  the 
same  as  the  corresponding  function  b(x,  y).  Of  course,  in  this  case 
the  independent  variables  x  and  ^  are  replaced  by  the  arbitrary  bool¬ 
ean  functions  f  and  £.  Hereafter  we  shall  use  the  described  technique 
for  the  introduction  of  new  binary  operations  on  the  set  of  boolean 
functions  without  detailed  explanations. 

The  possibility  of  the  interpretation  of  conjunction  as  conven¬ 
tional  multiplication  suggests  also  looking  for  boolean  analogs  for 
conventional  (numerical)  addition.  In  contrast  with  multiplication, 
here  there  cannot  be  complete  analogy,  of  course,  since  the  equality 
1  +  1  *  2  in  the  case  of  conventional  addition  Introduces  a  third  quan¬ 
tity  (two)  which  differs  from  both  zero  and  one.  With  the  limitation 
to  only  the  boolean  (binary)  alphabet,  the  direct  interpretation  of 
this  fact  is,  of  course,  Impossible.  Therefore  we  can  define  two  dif¬ 
ferent  (but  incomplete)  analogs  of  numerical  addition  for  the  boolean 
quantities,  setting  the  "sum”  of  two  ones  equal  to  either  one  or  zero. 

The  operation  (two-place  boolean  function)  which  arises  with  the 
first  assumption  is  termed  dls junction,  logical  addition,  and  also  log¬ 
ical  (the  so-called  inclusive)  OR.  For  the  designation  of  this  opera¬ 
tion  we  fix  the  special  symbol  (disjunction  sign)  V-  Thus,  the 

-  76  - 


disjunction  of  the  two  quantities  x  and  ^  (independent  variables  or 
functions)  will  be  designated  as  x  V  y.  The  quantities  x  and  ^  theim- 
selves  in  this  case  are  termed  the  logical  addends,  or  more  frequently 
the  disjunctive  terms. 

The  system  of  relations  which  completely  defines  the  operation  of 
disjunction  is  written  in  the  form  oy  0  =  0,  0  V  1  =  1.  1  V  0  =  1. 1  Vl  =1 
The  first  three  relations  are  exactly  the  same  as  in  the  case  of  con¬ 
ventional  (numerical)  addition,  and  only  the  fourth  relation  differen¬ 
tiates  logical  addition  from  conventional.  In  view  of  the  relations  in¬ 
troduced,  the  disjunction  oi  uhe  two  quantities  x  and  is  equal  to  ze¬ 
ro  when  and  only  when  both  these  quantities  become  zero.  If  even  one 
of  the  quantities  indicated  takes  the  value  1,  then  this  same  value  of 
1  is  taken  by  the  disjunction  itself,  regardless  of  the  value  of  the 
other  disjunctive  term. 

A  more  fortuitous  analog  of  conventional  (numerical)  addition  is 
obtained  in  the  case  when  the  "sum"  of  the  two  ones  is  assumed  to  be 
equal  to  zero.  The  operation  which  arises  in  this  case  (two-place  bool¬ 
ean  function)  is  usually  termed  the  non -equivalence  operation,  exclu - 
sive  OR,  and  also  modulo  two  addition.  The  last  term  is  associated 
with  the  fact  that  this  operation  coincides  with  modulo  two  addition 
as  defined  in  number  theory  if  the  zero  and  one  are  considered  as  ordi¬ 
nary  numbers. 

For  brevity  we  shall  term  this  operation  simply  addition  and 
shall  use  such  terms  as  sum  and  addend  by  analogy  with  conventional  ad¬ 
dition.  We  shall  use  the  usual  (+)  sign  to  designate  the  operation  of 
modulo  two  addition.  In  order  to  emphasize  that  we  are  not  discussing 
conventional  addition,  we  will  at  times  put  a  circle  around  this  sym¬ 
bol. 

The  operation  of  addition  of  boolean  quantities  is  defined  by  the 
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following  four  relations:  0+0=0,  0+1=1,  1+0=1,  1+1=0. 
The  first  three  of  them  are  exactly  the  same  as  in  the  case  of  conven¬ 
tional  (numerical)  addition  (and  the  same  as  in  the  case  of  logical  ad¬ 
dition — disjunction),  so  that  the  specific  nature  of  the  operation  in¬ 
troduced  is  defined  primarily  by  the  fourth  relation.  With  this  same 
relation  there  is  associated  the  term  for  the  addition  operation,  ex¬ 
clusive  OR  ,  which  is  used  in  mathematical  logic.  If  we  interpret  one 
as  true  and  zero  as  false,  then  the  sum  of  two  boolean  quantities  will 
be  true  when  and  only  when  either  the  first  or  second  quantity  is  true, 
but  not  when  they  are  both  true.  In  the  case  of  the  logical  sum  (inclu¬ 
sive  OR)  the  sum  is  also  true  when  both  addends  (disjunctive  terms) 
are  true  together.  OR  in  this  case  does  not  exclude  the  simultaneous 
truth  of  both  terms,  it  does  not  separate  the  question  of  the  truth  of 
the  sum  into  two  mutually  exclusive  cases,  and  this  is  the  source  of 
the  association  of  the  term  "inclusive"  as  applied  to  OR  in  the  logi¬ 
cal  sum  (disjunction). 

Still  two  more  two-place  boolean  functions  are  the  result  of  the 
single  binary  operation  termed  implication,  or  the  operation  of  logi¬ 
cal  succession.  We  use  the  symbol  D  for  the  designation  of  this  oper¬ 
ation.  Implication  is  defined  by  the  following  four  relations: 

0  3  0  *1,  03  1=1,  l  3  0=0,131  =1  •  In  the  implication  x  d  y,  in  con¬ 
trast  with  multiplication,  disjunction  and  addition,  the  order  in 
which  the  terms  are  arranged  is  of  essential  importance.  With  a  rever¬ 
sal  of  this  order  the  value  of  the  implication  changes  so  that  x  d  y 
and  y  d  x  sire  two  different  boolean  functions. 

If  we  designate  the  two-place  boolean  function  f(x,  y)  by  the  cor- 
tege  a^a^agCt^),  where  is  the  value  taken  by  this  function  on  the 
set  with  the  number  1 ( 1  =  0,1, 2, 3),  then  the  implication  x  d  y  will 
correspond  to  the  cortege  (1101)  while  the  implication  y  d  x 
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sponds  to  the  cortege  (1011).  We  note  at  the  same  time  that  the  prod¬ 
uct,  disjunction  and  sum  of  the  variables  x  and  regardless  of  the 
order  in  which  these  variables  are  written,  are  respectively  the  cor¬ 
teges:  (OOOl) ,  (Olll)  and  (0110). 

Prom  a  consideration  of  all  the  corteges  presented,  it  follows, 
incidentally,  that  all  five  of  the  two -place  boolean  functions  which 
we  have  defined  (product,  disjunction,  sum  and  two  implications)  are 
pairwise  different.  It  is  easy  to  see  that  the  cortege  for  the  nega¬ 
tion  of  any  boolean  function  is  obtained  from  the  cortege  for  the  func¬ 
tion  itself  by  replacing  all  the  zeros  by  ones  and  all  the  ones  by  ze¬ 
ros.  Using  this  rule,  we  can  determine  the  cortege  for  negation  of  the 
product  xy,  negation  of  the  disjunction  x  \j  y,  negation  of  the  sum 
x  +  y,  and  the  two  negations  for  the  implications  x  d  y  and  y  d  x. 

These  corteges  will  be  respectively  (1110),  (1000),  (1001),  (0010)  and 

(0100). 

It  is  easy  to  verify  that,  together  with  the  five  functions  previ¬ 
ously  introduced,  the  five  new  functions  (negations  of  the  preceding 
five)  compose  a  system  of  ten  pairwise  different  two-place  boolean 
functions.  They  all  differ  also  from  the  constant -functions  0  and  1 
and  the  functions  x,  y,  x,  y,  considered  as  functions  of  the  two  vari¬ 
ables  x  and  jf,  since  the  latter  functions  are  characterized  by  the  cor¬ 
teges  (0000),  (1111),  (0011),  (0101),  (1100),  (1010)  respectively. 

Thus,  we  have  written  out  all  16  of  the  two -place  boolean  functions 
which  can  in  general  be  constructed. 

Let  us  make  a  few  more  remarks  concerning  the  functions  intro¬ 
duced  above.  The  function  xy  (negation  of  the  product)  which  is  charac¬ 
terized  by  the  cortege  (1110)  and  the  binary  operation  which  is  de¬ 
fined  by  it  are  customarily  termed  Sheffer’s  stroke  function.  It  is 
easy  to  verify  (using  the  definitions  of  negation  and  disjunction) 
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that  the  Sheffer  stroke  can  be  represented  not  only  in  the  form  of  the 
negation  of  the  product  xy,  but  also  in  the  form  of  the  disjunction  of 
the  negations  x  V  y. 

The  negation  of  the  disjunction  x  V  y — the  so-called  Pierce  func¬ 
tion-characterized  by  the  cortege  (1000),  can  be  represented  also  in 
the  form  of  the  product  of  the  negations  of  the  variables  x  and  i.  e.  , 

T 

in  the  form  x  .  y.  It  is  easy  to  see  that  both  the  Sheffer  stroke  and 
the  Pierce  function,  similar  to  the  product,  disjunction  and  sum,  are 
symmetric  functions,  i. e. ,  they  do  not  change  their  values  with  permu¬ 
tation  of  the  arguments. 

The  negation  of  the  sum  x  +  y,  termed  the  equivalence  operation 
or  logical  equivalence  possesses  a  similar  property.  For  the  designa¬ 
tion  of  this  function  and  also  for  the  binary  operation  defined  by  it, 
we  use  the  special  symbols  —  or  s (equivalence  symbol).  The  function 
x  +  y  b  x  ^  y  is  characterized  by  the  cortege  (1001).  The  terms  "equiv¬ 
alence"  and  "nonequi valence"  as  applied  to  the  functions  x  —  y  and  x  + 
y  respectively  emphasize  the  fact  that  the  first  function  is  equal  to 
one  when  and  only  when  the  values  of  its  arguments  are  equal  to  one  an¬ 
other,  and  the  second — when  the  values  of  its  arguments  are  unequal. 

The  function  (binary  operation)  of  implication  can  be  expressed 
by  disjunction  and  negation.  It  is  easy  to  verify  that  x  3  y  =  x  V  y 
and  y  3  x  -  x  V;  y.  Negation  of  an  Implication,  also  termed  the  inhibit 
function,  is  easily  expressed  by  the  product  and  negation:  x  3  y  = 
x*y  ,  y  '~2)  x  ss  x  .  y.  Both  implication  and  the  inhibit  function  are  ex- 
amples  of  asymmetric  boolean  functions,  since  they  change  their  values 
with  permutation  of  the  arguments. 

In  conclusion  we  note  that  in  reading  formulas  the  conjunction 
symbol  A  (or  &)  is  pronounced  as  "and,"  the  disjunction  symbol  V  is 
read  as  "or,"  the  sum  sign  +  (or  ©j  is  read  "plus,"  the  implication 
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sign  o  is  read  " implies,"  the  equivalence  sign  —  (or  s)  is  read  as  e- 
quivalent,"  and  the  negation  sign  (or  ~~l  )  is  read  as  "not." 

All  the  ten  listed  two-place  boolean  functions  correspond  to  the 
respective  two-place  boolean  operations,  which  we  shall  designate  and 
name  exactly  the  same  as  the  functions  which  define  them. 

§2.  BOOLEAN  ALGEBRA 

Boolean  algebra  will  be  termed  the  set  of  all  (finite-place)  bool¬ 
ean  functions  considered  together  with  the  operations  of  negation,  dis¬ 
junction  and  multiplication  (conjunction)  specified  on  them. 

We  shall  use  the  letters  u,  v,  w,  ...  (with  or  without  subscripts) 
to  designate  any  elements  of  boolean  algebra,  i.e. ,  in  other  words,  an- 
y  boolean  functions.  One  of  the  primary  problems  of  boolean  algebra  is 
the  establishment  of  the  identity  relations  of  the  form  A(u,v,w,  . . . )  = 
=  B(u,v,w,  ...)  where  A(u,v,w,  . ..)  and  B(u,v,w,  •••)  designate  formu¬ 
las,  i.e.,  expressions  of  boolean  algebra,  constructed  from  a  finite 
number  of  letters  u,v,w,  ...,  the  signs  of  the  three  operations  of  the 
algebra,  the  boolean  constants  (0  and  l)  and  parentheses  for  the  desig¬ 
nation  of  the  order  of  performance  of  operations. 

The  formulas  must  be  constructed  properly.  In  other  words,  they 
must  reduce  to  completely  determinate  boolean  functions  after  the  se¬ 
lection  of  particular  boolean  functions  as  values  of  the  letters  u,v,w, 
...  appearing  in  these  formulas.  We  can  give  a  rigorously  formal  defi¬ 
nition  of  the  properly  constructed  formula,  introducible  recurrently 
using  the  rule:  all  the  letters  u,v,w,  ...  (with  or  without  subscripts) 
and  the  constants  0  and  1  are  properly  constructed  formulas.  If  A  and 

B  are  properly  constructed  furmulas,  then  (5),  (A)V(fl)  and  (A)»(B) 
are  also  properly  constructed  formulas.  A  set  of  prop  rly  constructed 
formulas  is  considered  coincident  with  the  set  of  all  formulas  which 
can  be  obtained  as  the  result  of  sequential  (multiple,  generally  speak- 


ing)  application  of  this  rule. 

The  introduction  of  each  additional  operation  into  the  formula  is 
accompanied  by  the  appearance  of  one  or  two  pairs  of  parentheses.  To 
avoid  excessive  cumbersomeness  of  the  formulas,  we  somewhat  expand  the 
concept  of  the  rule  for  the  construction  of  the  formula,  making  it  pos¬ 
sible  to  drop  some  parentheses  by  analogy  with  the  way  this  is  done  in 

n 

elementary  algebra.  To  do  this  we  introduce  the  rule  on  the  priority 
of  operations:  other  conditions  being  equal,  negations  are  performed 

first,  then  multiplication,  then  disjunction.  When  it  is  necessary  to 

) 

perform  operations  in  a  different  order,  parentheses  are  required.  In 
addition,  the  negation  sign  written  by  a  bar  over  an  entire  expression 
would  have  had  to  have  been  written.  It  will  also  be  established  later 
that  the  order  in  which  like  operations  are  performed  which  follow  di¬ 
rectly  after  one  another  in  the  formula  is  of  no  concern,  so  that  in 
this  case  the  parentheses  are  again  redundant  and  can  be  dropped.  Fi¬ 
nally,  we  recall  that  the  multiplication  sign  between  letters  can  be 
dropped. 

All  the  properly  constructed  formulas  obtained  as  the  result  of 
the  described  expansions  will  hereafter  be  termed  simply  formulas,  per- 

V 

mitting  using  in  them  in  addition  to  the  letters  u,v,w,  ...  any  other 
letters  of  the  Latin  alphabet. 

There  i3  a  very  simple  general  rule  for  the  verification  of  the 
correctness  of  the  identity  relations  in  boolean  algebra.  The  essence 
of  this  rule  amounts  to  the  following. 

« 

Every  formula  A(u,v,w,  ...)  of  boolean  algebra  can  be  considered 
as  the  representation  of  some  boolean  function  of  the  variables  u,v,w, 

....  Actually,  of  we  assign  these  variables  some  constant  values  (0 
and  l)  then,  using  the  relations  which  define  the  operations  of  nega¬ 
tion,  disjunction  and  multiplication  (i.e.  ,  relations  of  the  form 
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T)  =  1,  0 v  I  =  1  and  so  on),  we  can  after  a  finite  number  of  steps  find 
the  value  (0  or  l)  of  the  formula  A(u,v,w.  ...)  for  the  selected  val¬ 
ues  of  the  variables  u,v,w,  ....  and  this  then  means  that  our  formula 
is  some  everywhere -defined  boolean  function  of  the  variables  u,v,w,  .. 

It  is  easy  to  understand  that  the  (identity)  relation  A(u,v,w, 
...)  =  B(u,v,w,  ...)  is  valid  in  and  only  in  the  case  when  the  formu¬ 
las  A(u,v,w,  ...)  and  B(u,v,w,  ...)  represent  one  and  the  same  boolean 
function  of  the  variables  u,v,w,  ....  For  the  verification  of  the  fact 
of  the  indicated  equality  of  the  two  representations  it  is  sufficient 
to  verify  whether  the  values  of  these  representations  on  all  sets  of 
values  of  the  variables  u,v,w,  ...  coincide  or  do  not  coincide. 

Thereby  we  have  constructed  a  general  algorithm,  suitable  of  the 
verification  of  the  correctness  of  any  identity  relations  in  a  boolean 
algebra,  since  in  view  of  the  finiteness  of  the  number  of  sets  of  val¬ 
ues  for  any  finite  number  of  sets  of  the  boolean  variables  the  verifi¬ 
cation  described  always  terminates  after  a  finite  number  of  steps. 

Moreover,  it  becomes  clear  that  it  is  sufficient  to  establish  the 
identity  relations  in  the  boolean  algebra  for  the  case  where  all  the 
letters  appearing  in  these  relations  are  considered  as  Independent 
(boolean)  variables.  In  case  of  necessity,  moreover,  any  boolean  func¬ 
tions  can  be  substituted  in  place  of  these  variables. 

We  shall  designate  the  independent  variables  by  the  letters  x,  y, 
z  (with  or  without  subscripts).  We  shall  also  use  these  same  letters 
for  the  writing  of  the  identity  relations  of  boolean  algebra.  We  shall 
make  a  verification  of  the  indicated  relations  with  the  aid  of  substi¬ 
tuting  into  them  all  the  possible  sets  of  values  of  the  variables  (let 
ters)  appearing  in  these  relations. 

As  an  example  let  us  consider  the  commutativity  relation  for  mul¬ 


tiplication 
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xy  -  yx. 


(10) 

To  convince  ourselves  of  the  correctness  of  this  relation  it  is 
sufficient  to  note  that  its  left  and  right  parts  are  equal  to  zero  on 
the  sets  (00),  (01),  (10)  and  equal  to  one  on  the  set  (ll).  In  view  of 
the  triviality  of  such  a  verification  we  shall  not  repeat  it  in  the  su¬ 
ture  and  shall  limit  ourselves  to  simply  writing  out  the  relations  we 
need,  which  we  shall  also  term  laws  or  rules. 

In  addition  to  the  relation  (law)  of  commutativity,  for  multipli¬ 
cation  there  also  exist  the  so-called  law  of  associativity,  expressed 
by  the  equality 

x(yz)  =  (xy)z.  (ll) 

Multiplication  satisfies  still  another  law,  usually  termed  the 
ldempotency  law 

**-*•  (12) 

As  a  result  of  this  law,  the  concepts  of  power  and  raising  to  a 
power  have  no  actual  importance  for  the  boolean  algebra. 

The  laws  of  commutativity,  associativity  and  ldempotency  extend 
also  to  the  disjunction  operation.  The  corresponding  relations  are 
written 

(13) 

x\J  (y\J  z)~(x\J  y)\J  r,  (l4) 

*\J  x~x.  (15) 

Multiplication  and  disjunction  are  related  with  one  another  by 
the  first  and  second  distributive  laws,  which  can  be  expressed  by  the 
relations 

x{y\J  z)~xy\J  xz\  ( 16) 

x\J  yz  =  (x\J  y)-(x\J  z).  (17) 

We  note  that,  on  the  strength  of  the  agreements  made  on  the  prior¬ 
ity  of  the  operations,  the  right  side  of  relation  (16)  is  a  simplifica- 
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tion  (as  a  result  of  discarding  the  redundant  parentheses  and  the  mul¬ 
tiplication  sign)  of  the  formula  (x-y)  \j  (xz),  while  the  left  side  of  rela 
tion  (17)  is  a  simplification  of  the  formula  U)V  (y-z)  . 

For  multiplication  and  disjunction  there  are  valid  the  so-called 
absorption  rules,  expressed  by  the  following  relations 

x\Jxy  =  x;  (18) 

x(x\Jy)  =  x.  (19) 

For  the  negation  operation  the  law  of  double  negation  is  of  great 
importance 

X  =  X.  (20) 

On  the  strength  of  this  law  any  even  number  of  negations  performed  in 
sequence  does  not  alter  the  result,  while  any  odd  number  is  equivalent 
to  performing  a  single  negation. 

For  the  various  transformations  in  boolean  algebra  we  frequently 
need  to  make  use  of  the  so-called  de  Morgan  rules,  which  combine  to¬ 
gether  all  three  algebraic  operations, 

xy  =  x\Jy\  (21) 

xY~y  =  xy  (22) 

We  point  out  several  more  relations  which  include  the  constants 


0  and  1: 


x  V  X  =  I; 

(23) 

xx  -  0  ; 

(24) 

*0=*0; 

(25) 

X-  1  =  x\ 

(26) 

x  V  0  =-  x; 

(27) 

x  V  1  ='  1; 

(28) 

T  =  0; 

(29) 

(30) 

Relation  (23)  is  called  the  law  of  the  excluded  middle. 


relation 
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(24)  is  the  law  of  contradiction.  Relations  (25)  and  (28)  can  be  con¬ 
sidered  as  particular  cases  of  the  absorption  rules. 

Let  us  consider  some  corollaries  from  this  system  of  relation¬ 
ships.  Prom  the  laws  of  commutativity  and  associativity  for  disjunc¬ 
tion  and  multiplication,  there  follows  the  possibility  of  performing 
in  any  order  the  actions  for  finding  the  values  of  the  product  and  the 
disjunction  for  any  finite  number  of  terms.  From  this  there  follows 
the  previously  noted  possibility  of  writing  formulas  of  the  form 
*.V*.V-  V*„  and  *i*t  •••*/♦.  without  parentheses  with  no  chance  of  ambigu¬ 
ity  as  the  result  of  variations  of  the  order  of  performing  the  opera¬ 
tions. 

We  note  also  that,  as  follows  from  relations  (25)  and  (28),  the 
presence  of  even  a  single  one  in  the  disjunction  of  the  form  xt\j  xt\j ... 

V  *»  is  sufficient  to  transform  the  entire  disjunction  Into  a  one,  just 
as  the  presence  of  even  a  single  zero  cofactor  in  the  product  x.jX2  ... 
xJn  transforms  this  entire  product  into  zero.  At  the  same  time,  rela¬ 
tions  (26)  and  (27)  show  that  in  any  disjunction  the  terms  equal  to  ze 
ro  can  be  dropped,  and  in  any  product  the  terms  equal  to  one  can  be 
dropped. 

On  the  strength  of  relation  (20),  any  number  of  negations  per¬ 
formed  in  sequence  reduces  either  to  a  single  negation  or  in  general 
to  the  absence  of  any  negations.  We  shall  use  x  (read  as  "wavy  x")  to 
designate  an  expression  which  can  be  equal  to  either  of  the  two  expres 
sions  x  or  x.  Following  the  rule  established  above  for  the  verifica¬ 
tion  of  Identity  relations  in  boolean  algebra,  we  shall  term  the  formu 
las  representing  the  same  boolean  function  of  the  variables  appearing 
in  them  equal .  or  equivalent ,  to  one  another.  Although  the  equality  or 
inequality  of  any  two  formulas  of  boolean  algebra  can  in  principle  be 
verified  by  means  of  the  sorting  of  all  possible  combinations  of  the 
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values  of  the  variables  appearing  in  them,  with  an  Increase  of  the  num 
ber  of  variables  this  method  becomes  excessively  cumbersome  and  is  not 
suitable  in  practice.  Therefore,  one  of  the  primary  tasks  of  boolean 
algebra  is  the  development  of  more  economical  methods  of  establishing 
the  various  kinds  of  relations  which  obtain  in  this  algebra. 

For  the  resolution  of  this  problem  we  can  make  use  of  the  previ¬ 
ously  derived  relations  (l0)-(30),  applying  them  repeatedly  and  in  var 
ious  combinations.  For  example,  two-fold  application  of  relation  (12) 
makes  it  possible  to  establish  the  validity  of  the  relation  xxx  =  x, 
multiple  application  of  relations  (10)  and  (13)  makes  it  possible  to 
extend  the  laws  of  commutativity  for  disjunction  and  the  product  to  an 
y  desired  number  of  disjunctive  terms  and,  correspondingly,  cofactors. 
Thus,  there  arises  the  possibility  of  proving  various  relations  in 
boolean  algebra  by  transforming  their  left  and  right  sides  using  rela¬ 
tions  (10) -(30).  If  in  doing  this  we  manage  to  reduce  the  left  and 
right  sides  of  some  relation  to  the  same  formula,  then  the  validity  of 
the  corresponding  relation  is  thereby  established. 

It  is  not  clear  a  priori  whether  such  a  method  makes  it  possible 
to  derive  all  the  relations  existing  in  boolean  algebra.  However,  in 
actuality  such  derivation  is  always  possible.  To  establish  this  fact, 
let  us  define  some  standard  type  of  formula  to  which  we  shall  try  to 
reduce  all  the  formulas  of  boolean  algebra.  In  the  reduction  of  a  par¬ 
ticular  formula  A  of  boolean  algebra  to  the  standard  form  we  shall  al¬ 
ways  fix  some  finite  set  M  of  the  boolean  variables  x^  x2,  ....  ,  x  , 
of  necessity  Including  all  the  variables  which  occur  in  the  formula  in 
question.  We  shall  term  every  product  of  the  variables  or  their  nega- 

tions  (i.e.  ,  the  product  of  the  form  x.  x.  .  . .  x.  )  an  elementary 

X1  x2  xk 

product  if  each  letter  is  encountered  in  the  product  no  more  than  one 
time. 
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For  example,  the  products  x-jXg  or  x^XgX^  are  elementary,  while 
the  products  x^x^  or  X3^2X3  8,1,6  none^ementary*  We  shall  include  among 
the  elementary  products  the  variables  xi  themselves  and  their  nega¬ 
tions  \ ,  considering  them  as  products  consisting  of  a  single  cofactor. 
It  is  convenient  also  to  consider  that  the  constant  1  is  an  elementary 
product— the  product  of  zero  (empty  set)  cofactors.  The  number  of  co¬ 
factors  in  a  product  is  called  its  length.  The  elementary  products  for 
a  selected  set  M  of  variables  can  thus  have  any  length  from  0  to  n  in¬ 
clusive. 

The  elementary  products  of  maximal  length  (in  the  present  case, 
of  length  n)  are  customarily  termed  constituents  of  unity  for  the  se¬ 
lected  set  (M)  of  variables.  It  is  easy  to  see  that  every  constituent 
of  unity  contains  all  the  variables  of  the  set  M  (either  in  the  direct 
form  or  in  the  form  of  the  negation)  precisely  one  time  each,  and  that 
the  total  number  of  all  such  constituents  is  equal  to  2n. 

The  disjunction  of  any  number  of  elementary  products  which  does 

\ 

not  contain  two  identical  products  i3  termed  the  disjunctive  normal 
form.  The  disjunctive  normal  form  which  consists  exclusively  of  con¬ 
stituents  of  unity  is  termed  the  Ideal  disjunctive  normal  form. 

Just  as  in  the  case  of  the  products,  in  this  definition  it  is  not 
excluded  that  the  disjunction  in  question  can  consist  of  a  single  term 
(disjunction  of  length  l)  and  even  of  an  empty  set  of  terms  (disjunc¬ 
tion  of  length  0).  In  the  latter  case  the  disjunction  is  taken  equal 
to  zero  by  definition.  Thus,  the  formulas  o,  xlt  *iV***3.  1  can  be  consid¬ 
ered  as  disjunctive  normal  forms.  The  first  of  these  formulas  consists 
of  an  empty  set  of  terms,  the  second  consists  of  a  single  term,  the 
third  consists  of  two  terms,  and  the  fourth  consists  again  of  a  single 
term  which  is  the  elementary  product  of  zero  length. 

Replacing  in  all  the  definition  the  disjunctions  by  products, 

-  88  - 


products  by  disjunctions,  the  (boolean)  constant  0  by  the  (boolean) 
constant  1  and  vice  versa,  we  obtain  respectively  the  definitions  of 
the  elementary  dls junctions ,  constituents  of  zero,  the  conjunctive  nor¬ 
mal  form  and  the  ideal  conjunctive  normal  form. 

In  boolean  algebra,  as  a  result  of  the  fact  that  with  replacement 
of  zero  by  one  and  one  by  zero  the  disjunction  is  transformed  into  con¬ 
junction  and  vice  versa,  there  arises  a  unique  duality  of  the  proper¬ 
ties  of  disjunction  and  conjunction  (multiplication).  Performing  such 
a  replacement,  we  can  automatically  for  any  property  (relation)  de¬ 
rived  herafter  obtain  its  dual  property  (relation).  In  particular,  to 
all  the  properties  of  the  disjunctive  normal  forms  we  can  associate, 
using  the  indicated  duality  law,  the  corresponding  properties  of  the 
conjunctive  normal  forms.  Since  this  association  is  accomplished  each 
time  almost  automatically,  we  shall  limit  ourselves  in  the  future  to 
the  consideration  of  only  the  disjunctive  normal  forms. 

Using  relations  (10),  ( ll) ,  (l3)-(l6),  (23)  and  (26),  we  can 
transform  any  disjunctive  normal  form  into  its  equivalent  ideal  dis¬ 
junctive  normal  form.  Let  us  consider  the  process  of  such  a  transforma¬ 
tion  using  the  example  of  the  disjunctive  normal  form  of  three  varia¬ 
bles  x\]yz\/xyz •  which  for  brevity  we  shall  designate  with  the  single 
letter  f. 

The  third  term  of  this  formula  is  a  constituent  of  unity  and 
therefore  does  not  require  any  transformations.  In  order  to  be  a  con- 
stituent  of  unity,  the  second  term  lacks  the  multiplier  x  (l.e.  ,  x  or 

M  A" 

x),  and  the  first  term  lacks  the  factors  y  and  z.  On  the  basis  of  rela¬ 
tions  (23),  (26)  we  can  write  that  f  =x(y\J~y)(z\J  z)\J  yz(x\j~x)  \j  xyz  .  Using 
the  first  distributive  law  (relation  ( 16) )  and  relations  (10),  (ll), 
(13) -(15)  we  sequentially  bring  our  form  to  the  form  f -- {xy  \J  xy) 

(z  V ej \J~yzx\Tyfx\jxyz  =xyz\j  xyz\J  xyz\J xyz\J xyz\J xyz\J~xyz  =xyz \J  xyz V xyz \J xyz \J  xyz \J  xyz  •  The 
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last  expression  In  this  chain  of  equalities  is  the  desired  ideal  dis¬ 
junctive  normal  form.  We  now  establish  the  following  important  result. 

Theorem  1.  With  the  aid  of  relations  (l0)-(30)  any  formula  of 
boolean  algebra  can  be  reduced  to  the  ideal  disjunctive  normal  form. 

Actually,  using  several  times  the  de  Morgan  rules  (21)  and  (22), 
the  double  negation  law  (20),  and  also  the  relations  (29)  and  (30),  any 
formula  A (x^  Xg,  ...  xn)  of  boolean  algebra  can  be  reduced  without 
difficulty  to  its  equivalent  formula  B(x1,  Xg,  ...,  xR,  x1,  Xg,  ..., 
x^),  which  does  not  contain  any  negations  other  than  the  negations  as¬ 
sociated  directly  with  the  letters  x^,  Xg,  ...,  xn»  It  is  easy  to  clar¬ 
ify  the  transformations  required  in  this  case  from  the  example 

x  V  y*  V  yz  -  (*  V  y*)  ‘(y*)  =  *  (yz)  (y  V  *)  =  *  {y  V  ~z)  fy  V  *)• 

The  described  technique  of  sequential  dropping  of  the  negation  signs 
is  applicable  to  any  formula  of  boolean  algebra. 

The  formula  B(x1,  Xg,  ...,  xft  1^,  Xg,  ...,  3^)  is  constructed 
from  the  letters  (with  or  sithout  negations)  shown  in  its  designation 
with  the  use  of  only  the  multiplication  and  disjunction  operations.  Re¬ 
lations  (10),  (ll),  (13),  (14)  and  (l6)  show  that  expressions,  just  ex¬ 
actly  as  in  the  usual  school  algebra  course  (considering  disjunction 
as  addition),  can  be  transformed  to  remove  all  the  parentheses  and  to 
group  all  like  terms.  After  such  transformation  with  subsequent  ac¬ 
count  for  relations  (25),  (26)  and  (27)  our  formula  is  transformed  in¬ 
to  a  disjunction  of  certain  products  of  the  letters  x^,  Xg,  ...,  xn 
and  their  negations.  With  the  aid  of  relations  (10),  (12),  (24)  and 
(25)  all  these  products  can  be  transformed  to  their  equivalent  elemen¬ 
tary  products  or  zeros.  Now,  using  formulas  (27)  and  (15),  we  reduce 
our  formula  to  the  ideal  disjunctive  normal  form.  An  example  Qf  this 
was  discussed  above.  Thereby  the  theorem  is  completely  proved. 

It  is  clear  that  the  resulting  Ideal  disjunctive  normal  form  is 


equivalent  to  the  original  formula  since  we  used  equivalent  transforma¬ 
tions  in  each  of  the  steps  described  above. 

We  note  that  all  the  steps  performed  are  reversible,  so  that  with 
the  use  of  relations  (l0)-(30)  we  can  also  accomplish  the  reverse  con¬ 
version  from  the  constructed  ideal  disjunctive  normal  form  to  the  orig¬ 
inal  formula  A(x1,  Xg,  xn). 

Theorem  2.  For  the  arbitrary  boolean  function  f  of  any  finite  num¬ 
ber  of  variables  x, ,  x0,  x  there  can  be  constructed  one  and, 

1  e  n 

with  an  accuracy  to  permutation  of  the  disjunctive  terms  and  cofactors, 
only  one  ideal  disjunctive  form  with  the  same  set  of  variables  to 
which  it  is  equal. 

To  each  set  (a^Og,  ...,  an)  of  values  of  the  variables  x^,  Xg, 

>V  ««w 

. . . ,  xn  there  corresponds  exactly  one  constituent  of  unity  Xg. . .  xr, 

which  becomes  unity  on  this  set.  This  constituent  is  uniquely  defined 

***  r*~>  . 

by  the  condition  x.^  =  xi,  if  =1  and  by  x^  =  x1  if  =  0(i  =1,2, 

n).  All  the  remaining  constituents  for  the  given  set  of  values  of 
the  variables  have  zero  values.  Since  in  a  disjunction  the  terms  which 
are  equal  to  zero  can  be  discarded,  then  it  becomes  clear  that  the  dis¬ 
junction  £  of  the  constituents  of  unity  corresponding  to  all  the  sets 
on  which  the  values  of  the  function  f  are  equal  to  unity  is  an  ideal 
disjunctive  normal  form  equal  (as  a  boolean  function)  to  the  function 
f.  It  is  clear  also  that  every  variation  in  the  composition  of  the  con¬ 
stituents  of  unity  occurring  in  the  form  £  will  inevitably  alter  its 
value  table  and,  naturally,  will  destroy  the  established  equality.  Con¬ 
sequently,  the  f orm  £  is  defined  uniquely  by  the  function  f,  Q.  E.D. 

In  view  of  the  indicated  uniqueness  of  the  definition,  the  form  £ 
is  customarily  termed  the  ideal  disjunctive  normal  form  of  the  consid¬ 
ered  function  f. 

Two  other  important  results  follow  directly  from  theorems  1  and  2. 


-  91  - 


Theorem  3.  Any  boolean  function  can  be  represented  in  the  form  of 


a  formula  of  boolean  algebra. 

Theorem  4.  With  the  aid  of  relations  (l0)-(30)  every  formula  of 
boolean  algebra  can  be  represented  in  any  other  formula  which  is  equiv 
alent  to  it  (i.e.,  representing  the  same  boolean  function). 

Actually,  as  the  formula  representing  any  given  boolean  function 
f  we  can  choose  its  ideal  disjunctive  normal  form.  We  can  always  trans 
form  any  formula  A  into  its  equivalent  formula  B  by  means  of  the  ideal 
disjunctive  normal  form  £,  which,  as  a  result  of  theorem  2,  will  be 
common  for  formulas  A  and  B.  The  chain  of  transformations  which  trans¬ 
forms  the  formula  A  into  r,  and  the  chain  reducing  B  to  £  taken  in  re¬ 
verse  order  (on  the  strength  of  theorem  1  such  chains  exist)  consti¬ 
tute  a  chain  of  transformations  which  transform  the  formula  A  into  the 
formula  B. 

We  note  that  not  all  the  relations  (l0)-(30)  written  out  above 
from  the  proof  of  theorem  1  are  used  in  the  transformations  (for  exam¬ 
ple,  relation  (17)  is  not  used).  Therefore,  if  desired  the  system  of 
relations  (l0)-(30)  can  be  abbreviated  such  that  theorems  2  and  4  will 
be  valid  as  before. 

The  second  remark  concerns  the  fact  that  the  method  of  transform¬ 
ing  the  formula  A  into  its  equivalent  formula  B  by  means  of  the  ideal 
disjunctive  normal  form  £  common  to  both  of  them  was  necessary  only  to 
establish  the  principle  of  the  possibility  of  conversion  from  A  to  B. 
In  practice  this  method  usually  turns  out  to  be  too  cumbersome,  as  a 
result  of  which  we  generally  look  for  more  direct  ways  to  convert  from 
A  to  B  (although,  of  course,  sometimes  there  may  not  be  a  way  which  is 
significantly  shorter  to  get  from  A  to  B  than  the  "roundabout"  method 
indicated  above). 

An  important  problem  which  is  solvable  within  the  framework  of 
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boolean  algebra  is  the  problem  of  the  minimization  of  formulas.  The 


sense  of  this  problem  is  the  finding  of  a  general  technique  (algorithm) 
which  makes  it  possible  for  any  formula  of  boolean  algebra  to  find  its 
equivalent  formula  having  the  minimal  possible  complexity. 

As  the  criterion  of  the  complexity  of  a  formula  it  is  most  natur¬ 
al  to  take  the  number  of  operations  appearing  in  this  formula,  so  that, 
for  example,  the  complexity  of  the  formula  x  will  be  the  number  1, 
whi]e  the  complexity  of  the  formula  (xVy)  ( y\jz )  will  be  the  number  5  (two 
negations,  two  disjunctions  and  one  multiplication).  However,  follow¬ 
ing  the  tradition  established  in  the  majority  of  the  works  on  the  mini¬ 
mization  problem,  we  shall  make  use  of  a  different  criterion,  taking 
the  complexity  of  a  formula  to  be  the  total  number  of  letters  appear¬ 
ing  in  it.  Here  we  are  speaking  of  the  number  of  occurrences  of  the 
letters  (including,  possibly,  identical  letters  in  this  number)  and 
not  of  the  number  of  different  letters  in  the  formula.  Thus,  for  in¬ 
stance,  in  view  of  the  criterion  we  have  defined,  the  complexity  of 
the  formula  U Vy){x\Jy)  should  be  considered  4,  not  2. 

It  is  not  difficult  to  understand  that  the  set  M(A)  of  the  differ¬ 
ent  formulas  of  boolean  algebra  v:hose  complexity  does  not  exceed  the 
complexity  of  any  fixed  formula  A  will  inevitably  be  finite.  Therefore 
the  problem  formulated  above  of  the  minimization  of  formulas  can  in 
principle  be  resolved  by  the  sorting  of  all  the  formulas  of  the  set 
M(A)  in  the  order  of  increasing  complexity  until  a  formula  is  found 
which  is  equivalent  to  formula  A.  However,  the  algorithm  based  on  this 
sorting  is  so  cumbersome  that  is  is  not  suitable  in  practice. 

The  problem  of  the  construction  of  more  economical  algorithms  for 
the  minimization  of  formulas  in  boolean  algebra  has  not  yet  been 
solved  in  the  general  form.  Therefore,  in  practice  we  limit  ourselves, 
as  a  rule,  to  the  problem  of  finding  the  minimal  formula  in  a  particu- 

-  93  - 


lar  class  of  formulas  and  first  of  all  in  the  class  of  all  disjunctive 
normal  forms.  This  problem  is  usually  termed  the  problem  of  the  mini¬ 
mization  of  the  boolean  functions,  which,  of  course,  is  not  entirely 
accurate,  since  we  are  not  speaking  of  the  minimization  of  the  func¬ 
tion  (which  remains  unchanged  in  the  minimization  process)  but  of  tht 
minimization  of  the  formulas  which  represent  the  function  (in  the  pres¬ 
ent  case — the  disjunctive  normal  forms). 

All  the  methods  of  minimization  in  the  class  of  the  disjunctive 
normal  forms  are  based  on  the  concept  of  the  prime  imp li cant.  The  lm- 
pllcant  of  the  boolean  function  f  is  the  term  given  to  every  boolean 
bunction  £  whose  reduction  to  unity  is  possible  only  on  those  sets  of 
values  of  the  variables  on  which  the  function  f  reduces  to  unity.  We 
stipulate  that  the  implicant  £  covers  with  its  unities  some  unities  of 
the  function  f.  Prom  the  properties  of  the  disjunction  it  follows  that 
the  disjunction  of  any  (finite)  set  of  implicants  gug2 gn  of  the  func¬ 
tion  f  will  again  be  its  implicant.  If  in  this  case  the  unities  of  the 

implicants  gt.gi gn ,  considered  all  together,  cover  all  the  unities  of 

the  function  f,  then  this  disjunction  simply  coincides  with  the  func¬ 
tion  /:  gi\Jgi\/ ...  -/• 

The  reverse  is  also  clear:  any  term  of  the  disjunction  coinciding 
with  the  function  f  is  the  implicant  of  this  function  f  is  the  impli¬ 
cant  of  this  function,  and  the  unities  of  all  the  terms  of  the  indi¬ 
cated  disjunction  all  together  cover  all  the  unities  of  the  function  f. 
In  particular,  every  disjunctive  normal  form  _g  of  the  boolean  function 
f  can  be  considered  as  the  covering  of  this  function  by  the  set  of  all 
terms  of  form  _g,  each  of  which  is  the  implicant  of  the  function  f.  In 
this  case  the  elementary  product?  appear  in  the  role  of  implicants. 

We  note  that  with  a  reduction  of  the  length  of  the  elementary 
product  (as  the  rusult  of  dropping  part  of  the  cofactors)  the  number 
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of  unities  covered  by  it  is  Increased.  The  elementary  product  of  maxi¬ 
mal  length  (constituent  of  unity)  for  n  variables  reduced  to  unity  on¬ 
ly  at  one  point,  while  the  elementary  product  of  length  n  -  k  reduces  lo 
unity  at  2  points.  Therefore  it  is  of  advantage  to  cover  any  given 
function  f  by  elementary  products  of  the  minimal  possible  length,  i. e.  , 
by  such  elementary  products  that  they  themselves  are  implicants  of  the 
function  f ,  but  none  of  their  internal  parts  are  implicants  of  this 
function.  Such  elementary  products  are  customarily  termed  prime  Impli¬ 
cants  of  the  boolean  function  in  question. 

The  set  of  all  prime  implicants  of  any  boolean  function  f  covers 
all  its  ones.  Therefore  the  disjunction  £  of  all  prime  implicants  of 
the  function  f,  termed  the  reduced  disjunctive  normal  form  of  this 
function.  However,  this  representation  will  usually  not  be  the  most  e- 
conomical,  since  some  prime  implicants  can  civer  ones  wM on  are  alread- 
y  covered  by  the  remaining  implicants.  Discarding  from  the  form  £  all 
such  redundant  implicants,  we  transform  it  into  the  so-called  irreduci¬ 
ble  disjunctive  normal  form  of  the  function  f  in  question. 

A  boolean  function  can  have,  generally  speaking,  not  one  but  sev¬ 
eral  irreducible  disjunctive  normal  forms.  For  instance,  the  function 
of  the  three  variables  x,  y,  z,  reducing  to  zero  only  on  the  sets  (000) 
and  (ill)  and  equal  to  unity  on  all  the  remaining  sets,  has  five  dif¬ 
ferent  irreducible  disjunctive  normal  forms.  At  the  same  time,  we  can 
show  that  any  two-place  boolean  function  has  a  single  irreducible  dis¬ 
junctive  normal  form. 

It  is  easy  to  understant  that  among  the  irreducible  disjunctive 
normal  forms  of  any  boolean  function  f  there  are  inevitably  contained 
all  its  minimal  disjunctive  normal  forms  (there  may  be  several  of  them), 
1.  e.  ,  those  disjunctive  normal  forms  of  the  function  f  which  contain 
the  smallest  number  of  letters  in  comparison  with  all  che  remaining 
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disjunctive  normal  forms  of  this  function. 

We  can  construct  sufficiently  economical  algorithms  for  finding 
all  the  prime  implicants  and  all  the  irreducible  disjunctive  normal 
forms  of  any  boolean  function.  However,  for  separating  the  minimal 
forms  from  the  number  of  irreducible  disjunctive  normal  forms  there 
is  not  in  the  general  case  any  significantly  simpler  method  than  the 
method  of  sequential  sorting  and  comparison  of  all  the  irreducible  dis¬ 
junctive  normal  forms  (see  Zhuravlev  [36]). 

One  of  the  most  effective  algorithms  for  finding  the  prime  impli¬ 
cants  and  the  irreducible  disjunctive  normal  forms  is  the  algorithm 
proposed  by  Blake  [8].  The  essence  of  the  Blake  algorithm  is  the  fol¬ 
lowing.  It  is  not  difficult  to  establish  that  in  boolean  algebra  there 
is  satisfied  the  identity  relation  of  the  form 

AB  \J  AC  =  AB  \J  XC  \J  BC.  (31 ) 

If  in  this  relation  we  consider  A  to  be  a  letter  and  B  and  C  to 
be  elementary  products,  then  from  relation  (3l)  there  is  derived  the 
rule  for  identity  transformation  of  the  disjunctive  normal  forms  which 
makes  it  possible  if  they  contain  two  terms  of  the  form  xp  and  xq  to 
complement  them  with  a  new  term  (elementary  product)  pq.  It  is  possi¬ 
ble,  it  is  true,  that  this  term  vanishes  or  coincides  with  one  of  the 
disjunctive  forms  present  in  the  form  already.  It  is  easy  to  under¬ 
stand  that  in  view  of  the  finiteness  of  the  total  number  of  elementary 
products  (given  variables)  new  terms  will  not  be  obtained  by  means  of 
a  finite  number  of  steps  of  application  of  the  indicated  rule.  Blake's 
result  amounts  to  the  statement  that  the  transformed  form  of  f  after 
reaching  suitable  "stabilization"  will  contain  all  the  prime  impli¬ 
cants  of  the  boolean  function  which  it  represents. 

After  obtaining  the  disjunctive  normal  form  jg  containing  all  its 
prime  implicants,  it  is  not  difficult  to  free  it  of  all  the  terms 
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which  are  not  prime  impllcants.  Actually,  if  any  elementary  product  P 
from  g  is  not  a  prime  implicant,  then,  being  in  any  case  an  implicant 
of  the  function  g,  it  includes  in  itself  some  prime  implicant  jd  of 
this  function  and,  consequently,  can  be  represented  in  the  form  P  =  pq. 
Since  in  g  there  is  the  disjunctive  term  jd,  then  it  can  be  used  to  ex¬ 
clude  from  £  the  term  P  =  pq  with  the  aid  of  the  relation  (l8):pvpq  =  p. 
Such  an  exclusion  is  usually  termed  the  elementary  absorption  opera¬ 
tion.  Its  application  to  the  disjunctive  normal  form  g  provides  after 
a  finite  number  of  steps  the  cancellation  of  all  the  terms  which  are 
not  prime  implicantsand  the  conversion,  thusly,  of  the  form  g  into  the 
simplified  disjunctive  normal  form  gQ. 

In  order  to  go  from  the  form  gQ  to  some  irreducible  disjunctive 
normal  form,  we  can  find  the  redundant  terms  in  gQ  by  the  same  method 
of  Blake:  if  some  term  in  the  form  gQ  (or  in  any  other  disjunctive  nor¬ 
mal  form  consisting  of  prime  impllcants)  can  be  obtained  from  the  re¬ 
maining  terms  with  the  aid  of  the  application  (possibly  more  than  once) 
of  relation  (l8),  then  this  term  is  redundant  and  it  can  be  excluded. 

By  applying  such  an  exclusion  process  repeatedly,  we  reduce  the 
form  gQ  to  the  irreducible  disjunctive  normal  form  g^.  Actually,  on 
the  strength  of  the  result  of  Blake  presented  above,  with  the  aid  of 
relation  (30)  we  can  obtain  from  the  form  g^  all  the  prime  impllcants 
appearing  in  gQ.  But  further  exclusion  of  terms  of  the  form  g1  is  not 
possible.  Actually,  if  we  attempt  such  an  exclusion  at  lea,st  one  of 
the  unities  of  the  function  g1  will  be  uncovered.  It  is  clear  that  the 
prime  implicant  (excluded  from  g-^  covering  this  unity  now  cannot  be 
recovered  from  the  disjunction  of  the  remaining  terms  by  any  elementa¬ 
ry  transformations,  in  particular  with  the  aid  of  the  application  of 
Identity  (31 ) • 

In  order  to  obtain  all  the  irreducible  disjunctive  normal  forms. 


-  97  - 


.  J 


the  described  exclusion  method  should  be  applied  several  times  with 
variation  of  the  crder  in  which  the  attempts  are  made  to  exclude  the 
various  terms.  As  we  mentioned  above,  the  finding  of  the  minimal  dis¬ 
junctive  normal  forms  requires  a  complicated  operation  in  the  sorting 
of  all  the  irreducible  disjunctive  normal  forms  (which  can  be  of  vai- 
ied  complexity).  Therefore  in  practice  the  solution  of  the  problem  of 
minimization  is  usually  limited  to  finding  some  one,  randomly  selected, 
irreducible  disjunctive  normal  form. 

As  an  example  of  the  application  of  the  Blake  algorithm,  we  shall 
show  the  process  of  minimization  of  the  disjunctive  normal  form  f  =  *y*V 
VxyVxyz. 

Applying  relation  (31 )  to  the  pairs  composed  from  the  first  term 
with  the  second  and  from  the  first  term  with  the  third,  we  reduce  the 
given  f orm  f  to  the  form  /,  -xyz\jxy\jxyiVyz\/xz.  Application  of  relation 
(31)  to  any  pair  of  terms  of  the  form  f-^  does  not  lead  to  the  appear¬ 
ance  of  new  terms.  Consequently,  all  the  prime  implicants  of  the  func¬ 

tion  f  (the  function  represented  by  the  form  f )  are  contained  in  the 
form  f-^. 

The  application  of  the  operation  of  elementary  absorption  to  the 
form  f-L  leads  to  the  reduced  disjunctive  normal  form  f2  =  xyVyzVxz.  The 
first  term  of  the  form  f2  can  be  obtained  with  the  aid  of  relation  (31 ) 
from  the  remaining  two  terms  and  is  thus  redundant.  Excluding  it,  we 

come  to  the  form  f^  =  yzvxz ,  which  does  not  contain  redundant  terms  and 

which  is,  consequently,  the  desired  irreducible  disjunctive  normal 
form.  In  the  present  case  the  irreducible  disjunctive  normal  form  is 
the  only  one  and  as  the  result  of  this  it  coincides  with  the  minimal 
disjunctive  normal  form. 

More  detailed  Information  on  the  various  methods  of  minimization 
of  the  formulas  of  boolean  algebra  can  be  obtained  in  special  mono- 
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graphs  on  the  theory  of  the  synthesis  of  the  circuits  of  discrete  auto 
mata  (see,  for  example,  Glushkov  [26]).  Some  additional  information  on 
this  question  is  presented  also  in  §4  of  the  present  chapter. 

§3.  THE  CONCEPT  OP  COMPLETE  SETS  OP  BOOLEAN  OPERATIONS 

Theorem  3  of  the  preceding  section  shows  that  for  the  representa¬ 
tion  of  any  boolean  function  in  the  form  of  a  formula  constructed  from 
the  arguments  and  the  boolean  constants  0  and  1,  it  is  sufficient  to 
use  in  all  three  types  of  boolean  operations,  negation,  multiplication 
and  disjunction.  Every  set  of  boolean  operations  which  possess  such  a 
property  is  customarily  called  a  complete  set. 

In  addition  to  the  s^t  consisting  of  the  operations  of  negation, 
multiplication  and  disjunction,  we  can  also  construct  other  complete 
sets  of  boolean  operations.  Prom  the  de  Morgan  relations  (21)  and  (22) 
written  out  in  the  preceding  section,  it  follows  that  the  disjunction 
operation  can  be  represented  by  the  operations  of  negation  and  multi¬ 
plication,  and  that  the  multiplication  operation  can  be  represented  by 
the  operations  of  negation  and  disjunction.  Therefore  a  complete  set 
of  boolean  operations  can  be  composed  from  the  negation  operation  and 
any  of  the  two  remaining  operations  of  boolean  algebra  (multiplication 
or  disjunction). 

From  the  operations  of  any  complete  set  of  boolean  operations 
there  can  be  constructed  any  boolean  operations,  in  particular  the  op¬ 
erations  of  negation,  disjunction  and  multiplication.  In  order  to  per¬ 
form  the  required  construction  it  is  obviously  sufficient,  using  the 
operations  of  the  complete  set  being  considered,  to  represent  the  bool¬ 
ean  function  which  defines  the  required  operation.  Conversely,  if  from 
the  operations  of  some  set  we  can  construct  the  operations  of  negation 
and  multiplication  or  negation  and  disjunction,  then,  in  view  of  what 
we  have  said  above,  this  is  sufficient  for  the  possibility  of  repre- 
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senting  any  boolean  function  and,  c'  lsequently,  for  the  completeness 
of  our  set.  As  a  result  we  come  to  '.he  following  proposition. 

Theorem  1.  For  the  completeness  ni  any  set  of  boolean  operations 
it  is  necessary  and  sufficient  that  with  ^he  aid  of  the  operations  of 
this  set  we  can  construct  the  function  x  and  one  of  the  functions  xy 
or  x\Jy. 

Using  the  criteria  of  completeness  from  Theorem  1,  we  can  rela¬ 
tively  easily  establish  the  completeness  of  many  sets  of  boolean  oper¬ 
ations.  One  such  set  is,  in  particular,  the  set  consisting  of  tne  oper¬ 
ations  of  multiplication  and  addition  (modulo  two).  Actually,  it  is 
easy  to  verify  that  the  following  relation  is  valid 

I-jr+1.  (32) 

Thus,  negation  can  be  expressed  by  addition.  Since  multiplication 
itself  appears  in  the  set  in  question,  on  the  basis  of  Theorem  1  we  ar¬ 
rive  at  the  conclusion  on  the  completeness  of  this  set. 

With  the  aid  of  the  operations  of  addition  and  multiplication 
there  is  constructed  still  another  interesting  algebra  of  the  boolean 
functions,  termed  the  Zhegalkin  algebra.  In  its  general  properties  (ex¬ 
pressed  by  the  identity  relations)  this  algebra  approaches  most  close¬ 
ly  the  algebra  with  the  conventional  addition  and  multiplication  opera¬ 
tions  which  is  studied  in  high  school.  Like  conventional  addition,  mod¬ 
ulo  two  addition  satisfies  the  associativity  and  commutativity  rela¬ 
tions  (for  boolean  multiplication  these  properties  were  established  in 
the  preceding  section).  The  distributive  law  x(y  +  z)  =  xy  +  xz  is  al¬ 
so  satisfied,  making  it  possible  to  remove  parentheses  in  expressions 
Just  as  in  conventional  algebra. 

After  removal  of  the  parentheses,  any  formula  in  the  Zhegalkin  al¬ 


gebra  is  represented  in  the  form  of  the  sum  of  the  products  of  the  var¬ 
iables,  including,  possibly,  the  products  consisting  of  a  single  cofac- 
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tor  (single  letters)  and  of  a  zero  cofactor  (the  constant  l).  On  the 
basis  of  the  relation  xx  =  x  and  the  commutativity  of  multiplication, 
we  can  consider  that  in  any  of  the  products  obtained  no  letter  will  oc 
cur  more  than  one  time. 

Identical  products,  just  as  in  conventional  algebra,  are  consid¬ 
ered  similar  terms  and  are  subject  to  the  operation  of  reduction  of 
like  terms.  The  rules  for  this  reduction  are  different  from  the  corre¬ 
sponding  rules  in  conventional  algebra,  amounting,  in  the  final  analy¬ 
sis,  to  the  easily  verifiable  identity  relation 

x  +  jr-0.  (33) 

Thus,  any  even  number  of  Identical  addends  mutually  cancel,  while 
any  odd  number  is  equivalent  to  only  a  single  addend,  since  the  zero 
addend?  do  not  alter  the  values  of  the  sum  and  can  be  Immediately 
stricken  from  the  sum. 

The  reduction  of  likes  terminates  our  description  of  the  reduc¬ 
tion  process,  which  we  shall  call  the  process  of  reduction  of  formulas 
in  the  Zhegalkin  algebra  to  the  canonical  form.  Let  us  demonstrate 
this  process  using  an  example.  Let  there  be  given  some  formula  f  = 

=  (x  +  y)(x  +  z)  +  y(z  +  x)  of  the  Zhegalkin  algebra.  After  removal  of 
the  parentheses,  this  formula  takes  the  form  f-^  =  x  +  xy  +  xz  +  yz  + 

+  yz  +  xy.  After  combining  like  terms,  the  terms  xy  and  yz,  encoun¬ 
tered  twice  in  the  formula,  cancel  and  the  formula  itself  is  reduced 
to  the  final  (canonical)  form  fg  =  x  +  xz. 

In  view  of  the  completeness  of  the  set  of  operations  of  the  Zhe¬ 
galkin  algebra  and  the  possibility  of  reduction  of  any  of  its  formulas 
to  the  canonical  form,  every  boolean  function  f  can  be  represented  in 
the  Zhegalkin  algebra  by  a  formula  of  canonical  form.  It  is  not  diffi¬ 
cult  to  show  that  the  last  formula  is  determined  uniquely  by  the  func¬ 
tion  f  with  an  accuracy  £o  possible  permutation  of  addends  and  cofac- 
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tors.  We  shall  call  this  formula  the  canonical  polynomial  of  the  given 
boolean  function  f. 

The  uniqueness  of  the  determination  of  the  canonical  polynomial 
can  be  established  by  simple  reasoning.  Let  f^  and  f2  be  two  different 
canonical  polynomials  of  the  boolean  function  f.  Being  equal  to  this 
function,  the  polynomials  f1  and  fg  are  equal  to  one  another  as  func¬ 
tions  (for  all  values  of  the  variables).  In  the  equality  ^  =  f2  iden¬ 
tical  terms  in  the  right  and  left  sides  can  be  mutually  cancelled.  In 
the  right  and  left  sides  of  the  identity  relation  arising  after  this 
c  *2  t^iere  n°t  a  pair  of  identical  terms  (addends). 

Let  us  fix  one  of  the  addends  jd  which  is  composed  of  the  smallest 
number  of  letters  in  comparison  with  the  remaining  addends.  Then  all 
the  remaining  addends  will  differ  from  the  selected  addend  £  by  at 
least  one  letter.  Let  us  fix  the  set  of  values  of  the  variables  so 
that  all  the  letters  appearing  in  2  take  the  value  1  and  all  the  re¬ 
maining  letters  take  the  value  0.  On  the  strength  of  this  remark,  only 
one  of  the  addends,  and  precisely  the  addend  jd,  will  become  unity  with 
the  selected  set  of  values  of  the  variables,  all  the  remaining  addends 
will  be  equal  to  zero.  But  then  the  relation  f^  =  f2  is  brought  to  the 
form  1  *=  0  (or  0=1),  which  is  not  possible  of  the  original  relation 
f^  =  f2  was  identical.  Thereby  we  have  refuted  the  assumption  made  in 
the  beginning  of  our  discussion  on  the  existence  of  two  different  (al¬ 
though  equal  to  one  another  as  functions)  canonical  polynomials  f^  and 
f2  for  the  same  boolean  function  f. 

Canonical  polynomials  which  do  not  contain  products  of  two  or 
more  variables  (i. e. ,  polynomials  which  are  the  sum  of  individual  let¬ 
ters  and,  possibly,  the  constant  1,  and  also  the  polynomial  identical¬ 
ly  equal  to  zero)  are  the  so-called  linear  boolean  functions.  All  the 
remaining  boolean  functions  are  nonlinear.  Corresponding  to  this  divi- 
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sion  of  the  functions,  all  the  boolean  operations  which  they  determine 
are  also  divided  into  linear  and  nonlinear  operations. 

It  is  easy  to  see  that  with  any  superpositions  (substitutions  of 
one  in  the  other)  of  linear  boolean  functions  the  functions  resulting 
from  the  superposition  will  again  be  linear.  This  means,  clearly,  that 
with  the  aid  of  the  linear  (boolean)  operations  we  cannot  construct  an- 
y  nonlinear  operation.  This  implies  that  every  complete  set  of  boolean 
operations  must  include  at  least  one  nonlinear  operation. 

The  operations  of  negation  and  addition  (modulo  two)  are  linear 
operations,  since  the  canonical  polynomials  representing  their  boolean 
functions  will  be  the  linear  formulas  x  +  1  and  x  +  y.  At  the  same 
time  the  functions  xy  and  xVy.  and  consequently  the  multiplication  and 
disjunction  operations  which  they  define,  are  nonlinear.  The  first  of 
them  has  as  its  canonical  polynomial  the  formula  xy,  and  the  second  — 
the  formula  x  +  y  +  xy.  Both  these  formulas  contain  the  nonlinear  term 
xy. 

Let  us  introduce  still  another  division  of  the  boolean  functions 
and  their  corresponding  boolean  operations  into  two  classes:  the  class 
of  monotone  functions  (operations)  and  the  class  of  nonmonotone  func¬ 
tions  (operations).  To  do  this  let  us  define  for  the  sets  of  values  of 
the  boolean  variables  the  order  relation  <,  assuming  that  0  <  0,  0  <  1, 
1  <  1  and  that  for  any  two  sets  of  identical  length  (a1a2  •••  un)  and 
(0^2  ...  0n)  the  relation  (o^Og  .  ,.an)  <  (^132  .  ..0n)i  s  valid  when 
and  only  when  for  all  i  ■  1,  2,  . . . ,  n  a  <  p  .  if  these  sets  are  dif¬ 
ferent,  then  we  shall  say  that  the  first  of  them  is  the  smaller  and 
the  second  is  the  larger.  We  note  that  certain  sets,  for  example  the 
sets  (Ol)  and  (10),  will  in  this  case  be  incomparable  with  one  another, 
since  the  definition  presented  does  not  make  it  possible  to  consider 
that  one  of  them  is  larger  or  smaller  than  the  other. 
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The  boolean  function  f  Is  termed  monotonlc  if  with  transition 
from  any  smaller  set  A  of  values  of  its  variables  to  any  larger  (in 
comparison  with  A)  set  B  the  value  of  the  function  cannot  diminish, 
i.e.  ,  transition  from  the  value  1  on  the  set  A  to  the  value  0  on  the 
set  B.  If,  however,  even  for  one  pair  of  the  sets  A,  B  such  that  A  <  B, 
f (A)  =  1,  and  f (B)  =  0,  then  the  function  f  is  termed  a  nonmonotonic 
boolean  function.  The  boolean  operations  determined  by  these  functions 
are  correspondingly  divided  into  monotonic  and  nonmonotonic. 

For  any  superpositions  (substitutions  of  the  function  into  func¬ 
tion)  of  the  monotone  boolean  functions  we  again  obtain  monotone  func¬ 
tions.  Actually,  if  the  functions  f(y^,  yg,  ...,  y  )  and  (pCx^,  xg,  ... 
x^)  are  monotone,  and  the  function  <p  is  substituted,  say,  in  place  of 
the  variable  y^,  then,  on  the  strength  of  the  nomotonicity  of  the  func¬ 
tion  cp  with  any  increase  of  the  set  of  values  of  the  variables  y2,  y^, 
...  ym,  x1,  x2,  ...,  xn,  the  set  of  values  of  the  variables  cp,  y2,  y^, 

. . .  ym  will  either  increase  or  remain  unchanged.  In  both  cases  the  val¬ 
ue  of  the  complex  function  f(cp,  y2,  y^,  . ..,  ym),  ln  view  of  the  mono- 
tonicy  of  the  function  f(y^,  y2*  •••*  ym) ,  cannot  diminish,  which 
proves  Its  monotonicity. 

With  transfer  over  to  operations,  the  fact  just  established  means 
that  with  the  aid  of  only  the  monotone  boolean  operations  we  cannot 
construct  any  nonmonotone  boolean  operation  (for  example,  negation), 
which  implies  that  every  complete  set  of  boolean  operations  must  of 
necessity  include  at  least  one  nonmonotone  operation. 

The  simplest  example  of  nonmonotone  operation  is  that  of  negation. 
It  is  found  also,  that  in  a  certain  sense  every  nonmonotone  operation 
includes  the  negation  operation.  More  precisely,  the  following  proposi¬ 
tion  is  valid. 

Theorem  2.  The  negation  operation  can  be  constructed  with  the  aid 
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of  any  nonmonotone  boolean  operation. 

Let  us  consider  an  arbitrary  nonmonotone  boolean  operation.  It 
Is  defined  by  some  nonmonotone  boolean  function  fCx^  x 2,  ...  xn).  In 
view  of  the  nonmono ton! city  of  the  function  f,  there  are  two  sets  A 
and  B  of  values  of  its  variables  such  that  A  is  smaller  than  B,  and 
the  function  f  takes  on  the  value  1  on  the  set  A  and  the  value  0  on 
the  set  B.  The  set  A  differs  from  set  B  in  that  in  certain  of  the  loca¬ 
tions  where  in  the  set  B  there  stand  ones,  in  the  set  A  there  stand  ze¬ 
ros.  Replacing  sequentially,  one  by  one,  these  zeros  by  ones,  sooner 
or  later  we  arrive  from  the  set  A,  where  f(A)  =  1,  at  set  B,  where 
f(B)  =  0.  Consequently,  in  one  of  the  sequential  replacements  of  zero 
by  one  the  value  of  the  function  f  must  change  from  1  to  0.  This  means 
that  for  some  i  (l  <  i  <  n)  ffo^,  . . .,  a  ^  0,  ai+1>  •  •  •  ,  \)  =  1 

and  f  (c^,  ^2 •»  "  •>  ^1— l'1  *  *  *  *  =  where  ^2 *  *  *  *  3  ^i— 1^  '*i+l'> 

...,  an  are  certain  boolean  constants  (0  or  l).  But  them,  as  It  is  easy 

to  see,  the  boolean  function  ff0^,  a2,  •••>  a±-i3  x>  ai+i>  •••»  an) 
of  the  one  variable  x  can  be  nothing  other  than  the  negation  of  this 
variable,  i. e. ,  x.  Interpreting  the  function  f  as  a  boolean  operator, 
we  obtain  the  required  representation  with  the  aid  of  this  negation  op¬ 
eration. 

In  the  classification  of  the  boolean  operations  we  shall  exclude 
from  consideration  the  zero-place  operations  (constants  0  and  l),  and 
also  the  trivial  single-place  operation  which  repeats  the  values  of 
its  argument.  It  is  also  natural  to  not  differentiate  between  the  oper¬ 
ations  which  arise  from  the  same  boolean  function  with  various  permuta¬ 
tions  of  its  arguments.  Taking  account  of  this,  we  shall  have  a  single 
one -place  operation,  negation  x,  and  eight  two-place  operations,  multi¬ 
plication  xy,  disjunction  xyy.  addition  x  +  y,  the  equivalence  opera¬ 
tion  x  ~  y,  the  implication  x  D  y,  the  Inhibit  operation  xy,  the  Shcffer 
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operation  x\jy  and  the  Pierce  operation  xy. 

It  is  easy  to  verify  that  only  two  of  all  the  listed  nine  boolean 
operations  are  monotone:  multiplication  and  disjunction.  Only  three  op¬ 
erations  are  linear;  negation,  addition  and  equivalence.  Thus,  we  ar¬ 
rive  at  the  following  result. 

Theorem  3.  Among  the  nine  single -place  and  two -place  boolean  oper¬ 
ations,  those  of  multiplication  and  disjunction  are  nonlinear  (but  mon¬ 
otone),  those  of  negation,  addition  and  equivalence  are  nonmonotone 
(but  linear).  The  remaining  four  operations,  inhibit,  implication, 
Sheffer  and  Pierce,  are  both  nonlinear  and  nonmonotone. 

It  is  not  difficult  to  derive  the  following  important  result  from 
Theorems  2  and  3* 

Theorem  4.  With  the  use  of  any  nonlinear  operation  there  can  be 
obtained  either  the  multiplication  or  disjunction  operation. 

Let  us  consider  the  arbitrary  nonlinear  operation  defined  by  the 
nonlinear  boolean  function  f(x1,  xg,  ...,  xn).  The  canonical  polynomi¬ 
al  of  this  function  contains  at  least  one  product  with  two  or  more  co- 
factors.  Let  us  separate  among  all  such  products  one  of  those  which 
have  the  smallest  length  1,  This  product  contains  no  less  than  two  co¬ 
factors  and,  consequently,  has  the  form  x^x^p,  where  jd  is  the  product 
of  some  set  of  letters  (possibly  empty  or  containing  only  a  single  let¬ 
ter).  Keeping  the  letters  x.^  and  Xj  unchanged,  we  replace  all  the  let¬ 
ters  occurring  in  £  by  ones,  and  all  the  remaining  letters  (from  the 
number  of  letters  x-^,  x2,  ...,  xfi)  by  zeros.  After  this  substitution 
the  product  which  we  separated  out  becomes  x^j  and  all  the  remaining 
products  of  length  greater  than  one  become  zero,  since  each  of  them 
contains  at  least  one  letter  different  from  Xj  and  from  the  let¬ 
ters  occurring  in  jd. 

After  this  substitution  we  obtain  the  boolean  function  of  two  var- 
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iables  cpfx^,  x^),  whose  canonical  polynomial  has  the  form  x^x ,  +  oxi  + 
+  0Xj  +  y ,  where  a,  0,  y  are  the  boolean  constants  (0  or  l).  If  this 
function  is  equal  to  x^Xj  or  *,\/*/  =  *,*,  +*,  +  */.  then  the  theorem  is 
proved.  Otherwise,  being  nonlinear,  this  function,  on  the  strength  of 
Theorem  3 »  will  inevitably  be  nonmonotone.  But  them,  by  Theorem  2, 
with  the  aid  of  the  boolean  operation  defined  by  the  function  <p  we  can 
express  the  negation  x  =  x  +  1.  Having  available  the  functions  x^,  Xj, 
x1  +  1,  Xj  +  1  we  can  in  the  function  <p  replace  x.^  by  xi  +  p,  and  x^ 
by  Xj  +  a,  after  which  we  obtain  the  function  ^(x^,  x^)  with  the  -anon 
ical  polynomial  (x^  +  0)  (x^  +  a)  +  a(x1  +  0)  +  0(Xj  +  a)  +  y  =  x1xj.  + 
+  ax^  +  0Xj  +  a0  +  ax^  +  a0  +  0Xj  +  a0  +  y  =  x^x^  +  6,  where  the  let¬ 
ter  6  designates  the  boolean  constant  a0  +  y,  If  6  =  0,  then  i^(x.,  x.) 
=  x^Xj,  and  the  theorem  is  proved.  If,  however,  6=1,  then  -^(x^x^)  = 

=  x^Xj  +  1  =  XjXj.  Since  we  have  already  constructed  the  negation, 
from  the  last  function  it  is  again  easy  to  obtain  the  product  x^x^. 

Thus,  in  all  cases  we  can  with  the  aid  of  the  given  operation  con 
struct  expressions  for  the  function  xy  ovx\Jy,  and  consequently,  also 
for  the  operations  defined  by  them,  Q. E.  D.  Now  it  is  easy  to  derive 
the  following  condition  of  completeness  for  the  sets  of  boolean  opera¬ 
tions. 

Theorem  5.  In  order  that  a  set  of  boolean  operations  by  complete, 
it  is  necessary  and  sufficient  that  at  least  one  nonlinear  operation 
and  at  least  one  nonmonotone  operation  be  included  in  the  composition 
of  this  set. 

The  necessity  of  this  condition  was  established  above,  and  the 
sufficiency  is  a  direct  result  of  Theorems  2  and  4. 

We  agree  to  call  the  complete  set  of  boolean  operations  irreduci¬ 
ble  if  from  it  we  cannot  exclude  a  single  operation  without  the  set 
losing  its  property  of  completeness.  Theorem  5  makes  it  easy  to  list 
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all  the  irreducible  complete  sets  composed  from  singlerplace  and  two- 
place  boolean  operations.  These  are  four  sets,  each  of  which  consists 
of  a  single  operation  (implication,  inhibit,  Sheffer  operation  and 
Pierce  operation),  and  six  complete  irreducible  sets  consisting  of  two 
operations:  combination  of  the  operation  of  multiplication  with  each 
of  the  operations  of  negation,  addition  or  equivalence,  and  combina- . 
tions  of  the  operation  of  disjunction  with  each  of  the  same  three  oper 
ations. 

The  concept  of  completeness  which  we  have  used  has  made  it  possi¬ 
ble  in  the  construction  of  the  boolean  functions  to  use  not  only  the 
arguments  of  these  functions  and  the  operations  from  the  corresponding 
complete  set,  but  also  the  boolean  constants  0  and  1.  If  we  exclude 
the  possibility  of  using  the  constants,  then  there  arises  a  new  con¬ 
cept  of  completeness  which  we  shall  term  strong  completeness  or  com¬ 
pleteness  in  the  strong  sense. 

By  no  means  all  the  complete  sets  of  the  boolean  operations  satis 
fy  the  condition  of  strong  completeness.  For  example,  the  set  consist¬ 
ing  of  the  operations  of  addition  and  multiplication,  being  complete, 
nevertheless  is  not  complete  in  the  strong  sense.  It  Is  easy  to  see 
that  without  the  use  of  the  constant  1  all  the  boolean  functions  con¬ 
structed  with  the  aid  of  this  set  of  operations  vanish  at  the  point  at 
which  all  their  arguments  take  on  zero  values.  Thus,  with  the  use  of 
only  the  operations  of  addition  and  multiplication  (without  the  con¬ 
stant  l)  there  cannot  be  represented  a  whole  series  of  boolean  func¬ 
tions,  for  example  the  function  x  or  the  function  identically  equal  to 
unity. 

At  the  same  time,  the  sets  composed  from  the  operations  of  nega¬ 
tion  and  multiplication  or  negation  and  disjunction  are  complete  not 
only  in  the  conventional  sense  but  also  in  the  strong  sense.  In  order 
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to  convince  ourselves  of  this  it  is  sufficient,  obviously,  to  prove 
the  possibility  of  representing  the  constants  0  and  1  with  the  aid  of 
the  operations  indicated.  This  is  done  by  the  formulas  0=*?,  I  =  xx, 

1  0-x"vT 

The  necessary  and  sufficient  conditions  for  strong  completeness 
for  the  sets  of  boolean  operations  were  found  by  Post  [62].  In  order 
to  formulate  these  conditions,  it  is  necessary  to  become  acquainted 
with  three  new  remarkable  classes  of  boolean  functions  and  the  opera¬ 
tions  which  they  define. 

The  boolean  function  (operation)  f(x^,  xg,  ...,  x  )  is  termed  a 
z ero -preserving  function  (operation)  if  f(0,  0,  ...,  0)  a  unity¬ 
preserving  function  (operation)  if  f(l,  1,  ...,  l)  =1  and  a  self-dual 
function  (operation)  if  ffx^  xg,  ...,  xn)  =  ff^,  x2,  ...,  xn).  The 
result  of  Post  mentioned  above  can  now  be  formulated  as  follows. 

Theorem  6.  In  order  that  a  set  of  boolean  operations  be  complete 
in  the  strong  sense  it  is  necessary  and  sufficient  that  this  set  in¬ 
clude  in  itself  at  least  one  nonlinear  operation,  at  least  one  non- 
monotone  operation,  at  least  one  non -zero-preserving  operation,  at 
least  one  non-unity-preserving  operation,  and  at  least  one  operation 
which  is  not  self -dual. 

The  necessity  of  the  conditions  formulated  in  Theorem  6  is  proved 
by  exactly  the  same  method  as  in  the  case  of  the  conventional  complete¬ 
ness:  it  is  necessary  to  convince  ourselves  only  (and  this  is  not  dif¬ 
ficult  to  do)  that  without  using  the  constants,  with  the  aid  of  the 
zero-preserving  operations  we  can  construct  only  those  boolean  func¬ 
tions  (and  this  means  the  boolean  operations  as  well)  which  also  pre¬ 
serve  zero.  The  situation  will  be  the  same  with  the  operations  which 
preserve  unity  and  with  the  self -dual  operations.  Proof  of  the  suffi¬ 
ciency  reduces  to  establishment  of  the  possibility  of  construction  of 


the  constants  0  and  1  and  the  subsequent  application  of  Theorem  5*  The 
details  of  this  proof  can  be  found  in  the  article  of  Yablonskiiy  [83] 
(see  also  Glushkov  [26]). 

Of  the  nine  single-place  and  two-place  boolean  operations  listed 
above,  six  operations  are  not  zero -preserving:  negation,  the  equiva¬ 
lence  operation,  implication,  and  also  the  Sheffer  and  Pierce  opera¬ 
tions. 

The  list  of  operations  which  are  not  unity-preserving  also  in¬ 
cludes  six  operations:  negation,  addition,  the  inhibit,  Sheffer  and 
Pierce  operations. 

Finally,  all  the  operations  other  than  negation  are  not  self -dual: 
multiplication,  disjunction,  addition,  implication,  and  also  the  opera¬ 
tions  of  equivalence,  inhibit,  Sheffer  and  Pierce. 

’’’he  Sheffer  and  Pierce  operations  possess  the  most  remarkable 
property:  each  of  them,  considered  individually,  is  a  complete,  in  the 
strong  sense,  set  of  boolean  operations.  These  sets,  of  course,  are  ir¬ 
reducible  in  the  sense  that  from  them  we  cannot  remove  a  single  opera¬ 
tion  without  the  set  losing  the  property  of  strong  completeness. 

It  is  easy  to  show  that  every  operation  which  is  not  zero -preserv¬ 
ing  is  either  also  not  unity -preserving  or  is  not  a  self -dual  opera¬ 
tion.  This  implies  that  in  any  irreducible  strong  complete  set  of  bool¬ 
ean  operations  there  cannot  be  more  than  four  (and  not  five,  as  it 
might  seem  a  priori)  different  operations,  and  irreducible  strong  com¬ 
plete  sets  consisting  of  four  different  boolean  operations  actually  do 
exist. 

§4.  APPLICATION  OF  BOOLEAN  ALGEBRA  IN  THE  THEORY  OF  COMBINATION  CIR¬ 
CUITS 

Combination  circuits  are  the  simplest  technical  devices  for  the 
conversion  of  discrete  information.  Let  us  assume  that  we  have  at  our 


disposal  a  finite  number  of  types  of  signals  of  a  particular  nature 
(mechanical,  electrical,  optical,  etc.  )  composing  the  so-called  signal 
S  alphabet.  We  shall  use  the  term  combination  circuit  for  any  device  P 
which  realizes  some  alphabetic  operator  A  =  A(S)  in  the  alphabet  S  and 
satisfies  the  following  conditions. 

1.  The  domain  of  definition  of  the  operator  A  is  the  set  of  words 
in  the  alphabet  S  having  the  fixed  length  m  >  1  (depending  on  the 
choice  of  the  device  P). 

2.  All  the  input  words  from  the  domian  of  definition  of  the  opera¬ 
tor  A  are  transformed  by  the  circuit  P  into  output  words  of  the  same 
length  n  >  1  (also  depending  on  the  choice  of  P). 

3.  All  the  letters  (signals)  composing  the  input  word  are  applied 
simultaneously  to  the  m  points  of  the  circuit  P  which  are  called  its 
input  poles  t  and  at  the  same  time,  also  simultaneously,  all  the  let¬ 
ters  (signals)  of  the  corresponding  output  word  appear  at  another  n 
points  of  the  circuit  which  are  called  its  output  poles.  The  input 
and  output  poles  are  numbered  in  a  strictly  fixed  method  and  are  asso¬ 
ciated  with  the  corresponding  locations  of  the  input  and  output  words, 
so  that  the  i-th  input  pole  (l  =1,  2,  ...,  m)  and  the  j-th  letter  of 
the  output  word  appears  at  the  j-th  output  pole  j  =1,  2,  ...  ,  n. 

Of  course,  every  real  technical  device  has  some  internal  delay, 
so  that  the  condition  of  simultaneity  of  the  appearance  of  the  input 
and  output  signals  in  the  combination  system  is  not  to  be  understood 
too  literally.  We  are  considering  some  abstraction  of  the  actually  en¬ 
countered  case  in  which  che  indicated  delay  can  be  neglected  in  compar¬ 
ison  with  the  interval  of  discrete  operation  of  the  circuit,  deter¬ 
mined  by  the  time  for  the  replacement  of  one  input  word  by  another. 

In  practice,  the  combination  circuits  are  usually  characterized 
by  the  absence  of  memory  in  them.  This  means  that  the  output  word  ap- 
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pears  at  the  output  poles  of  the  circuit  only  for  that  time  while  the 
corresponding  input  word  is  applied  to  the  input  poles.  After  the  ap¬ 
plication  of  the  input  signals  has  been  terminated,  the  circuit  "for¬ 
gets"  these  signals,  so  they  cannot  affect  the  process  of  the  forma¬ 
tion  of  the  response  of  the  circuit  to  the  following  combination  of 
signals  applied  to  its  input  poles. 

Conditions  1  and  2  impose,  at  first  glance,  very  strong  limita¬ 
tions  on  the  alphabetic  operators  which  can  be  realized  by  the  combina¬ 
tion  circuits.  In  actuality,  however,  words  of  the  same  length  (select¬ 
ed  each  time  in  accordance  with  the  specific  conditions)  can  be  used 
to  code  any  finite  ensemble  of  words. 

Thus,  with  suitable  coding  the  combination  circuits  can  realize 
any  alphabetic  operators  with  finite  domains  of  definition. 

The  simplest  technique  for  equalizing  the  lengths  of  any  fixed 
set  of  words  by  coding  consists  in  the  suffixing  (repeatedly,  general¬ 
ly  speaking)  to  the  words  of  lesser  length  an  empty  word  which  is  spe¬ 
cially  introduced  into  the  alphabet  for  the  purpose  of  bringing  the 
number  of  letters  composing  these  words  up  to  the  number  of  letters 
composing  the  longest  word  of  the  set  in  question.  Of  course,  other 
techniques  of  resolving  this  problem  are  possible. 

We  note  also  that,  with  suitable  treatment  of  the  operation  of 
the  combination  circuits,  we  can  consider  that  the  same  combination 
circuit  is  capable  of  realizing  not  one,  but  any  finite  set  of  alpha¬ 
betic  operators.  To  accomplish  this  it  is  sufficient  to  separate  all 
the  input  poles  of  the  circuit  into  the  so-called  Information  and  con¬ 
trol  poles.  If  we  consider  the  transformed  input  word  to  be  only  that 
combination  of  input  signals  which  is  applied  to  the  information  poles, 
then  by  fixing  various  control  words  (i.e.,  words  applied  to  the  con¬ 
trol  poles)  we  will  obtain  different  alphabetic  operators  which  assocl- 
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ate  the  output  words  of  the  circuit  to  the  information  input  words. 

The  technique  for  the  variation  of  the  alphabetic  operators  real¬ 
ized  by  the  combination  circuit  with  the  use  of  the  control  words  is 
completely  analogous  to  the  technique  described  in  the  first  chapter 
for  the  organization  of  the  operation  of  the  universal  algorithm:  to 
the  input  of  the  universal  algorithm  there  is  applied  not  only  the  in¬ 
formation  word  to  be  transformed  but  also  the  control  word,  for  which 
we  select  the  representation  of  the  specific  algorithm  which  is  to  be 
realized. 

For  technical  reasons  it  is  simpler  and  more  convenient  to  select 
the  binary  alphabet  as  the  signal  alphabet.  In  this  case  two  types  of 
signals  are  usually  identified  with  the  boolean  constants  0  and  1.  We 
shall  term  the  combination  circuits  with  such  a  signal  alphabet  binary 
or  boolean,  combination  circuits. 

In  the  binary  combination  circuit  each  output  signal  is  some  bool 
ean  function  of  the  input  signals  of  the  circuit.  If  the  circuit  has  m 
input  and  n  output  poles,  then  the  alphabetic  operator  realized  by  it 
is  completely  characterized  by  the  system  of  n  boolean  functions  of  m 
variables  which  give  the  output  signals  on  each  of  the  n  output  poles 
as  a  function  of  the  signals  on  the  m  input  poles.  We  shall  term  this 
system  of  functions  the  output  functions  of  the  circuit  in  question, 
and  the  circuit  itself  will  be  termed  a  boolean  (m,  n)-terminal  net¬ 
work. 

The  results  of  the  preceding  section  lay  the  theoretical  base  for 
one  of  the  primary  problems  of  the  theory  of  boolean  (m,  n) -terminal 
networks — the  problem  of  their  synthesis.  The  essence  of  the  pioblem 
of  the  synthesis  of  combination  circuits  in  general  and  of  boolean  (m, 
n) -terminal  networks  in  particular  amounts  to  the  development  of  the 
methods  which  make  it  possible  to  construct  circuits  which  are  as  com- 


plex  as  desired  from  a  fixed  (usually  quite  small)  number  of  types  of 
elementary  combination  circuits,  which  in  the  case  of  the  binary  cir¬ 
cuits  are  called  logic  elements. 

Any  boolean  (m,  l) -terminal  network  can  be  selected  as  a  logic  el 
ement.  In  view  of  what  we  have  said  above,  its  operation  can  be  charac 
terized  by  the  output  function  f(x^,  Xg,  . ..,  xm)>  which  is  a  boolean 
function  of  m  variables  which  gives  the  relationship  of  the  single  out 
put  signal  of  the  element  we  have  selected  as  a  function  of  the  ensem¬ 
ble  of  all  its  input  signals.  We  say  that  the  selected  logic  element 
realizes  this  boolean  function  or,  correspondingly,  realizes  the  bool¬ 
ean  operation  defined  by  this  function. 

Let  us  assume  now  that  some  set  of  logic  elements  has  been  select 
ed.  Th,  synthesis  of  the  combination  circuit  from  the  elements  of  the 
selected  set  amounts  to  the  sequential  connection  of  the  output  poles 
of  some  elements  to  the  input  poles  of  other  elements  in  such  a  way 
that  several  output  poles  are  not  connected  to  the  same  input  pole, 
and  so  that  closed  circuits  are  not  formed  along  which  a  signal  emerg¬ 
ing  from  some  element  Q  and  passing,  possibly,  through  other  elements 
again  can  arrive  at  one  of  the  input  poles  of  the  same  element  Q.  Here 
we  shall  assume  that  we  have  at  our  disposal  an  unlimited  number  of 
copies  of  any  element  of  the  selected  set  so  that  there  will  be  no 
shortage  in  quantity  (but  not  number  of  types  )  of  logic  elements  at 
any  time. 

After  completing  the  described  process  of  the  connection  of  the 
output  poles  of  some  elements  to  the  input  poles  of  others,  some  set  M 
of  input  poles  and  some  set  N  of  output  poles  are  free  of  any  connec¬ 
tions  with  other  poles.  It  is  natural  now  to  take  the  set  M  as  the  set 
of  input  poles  and  the  set  N  as  the  set  of  output  poles  of  the  complex 
circuit  constructed  as  a  result  of  the  described  process. 


If  in  the  process  of  the  connection  of  the  poles  we  have  observed 
the  limitations  presented  above,  then  the  circuit  constructed  will 
give  an  output  signal  on  each  of  the  n  poles  of  the  set  N  as  a  com¬ 
pletely  determined  boolean  function  of  the  signals  on  all  m  poles  of 
the  set  M.  Therefore  we  can  consider  it  as  a  combination  circuit  in 
the  binary  alphabet  or,  more  precisely,  as  a  boolean  (m,  n) -terminal 
network. 

It  is  easy  to  understand  that  the  set  N  of  output  poles  of  the 
circuit  can  be  complemented  by  the  poles  which  have  been  subjected  to 
connection,  which  we  shall  term  the  Internal  nodes  of  the  circuit. 

With  use  of  several  types  of  specific  physical  realizations  of  the  bi¬ 
nary  signals,  we  can  connect  several  output  poles  to  the  same  input 
pole.  Ambiguity  does  not  arise  as  result  of  the  arrival  of  several  slg 
nals  at  the  same  pole  in  view  of  the  existence  of  the  so-called  natur¬ 
al  separation  of  signals.  Natural  separation  amounts  to  the  fact  that 
a  zero  signal  is  formed  on  a  particular  pole  when  and  only  when  all 
the  signals  arriving  simultaneously  at  this  pole  are  equal  to  zero.  If 
however,  even  one  of  the  arriving  signals  is  equal  to  one,  then  the 
combined  signal  is  equal  to  one.  In  this  case  the  input  signals  of  the 
circuit  can,  evidently,  also  be  applied  to  certain  of  its  internal 
nodes,  as  the  result  of  which  they  are  included  in  the  set  M  of  input 
poles  of  the  circuit. 

If  the  synthesized  circuit  has  a  single  output  pole  and  is  thus 
characterized  by  a  single  output  boolean  function  f(x.^,  ...,  xIr)> 

the  described  process  of  construction  of  the  circuit  by  the  method  of 
sequential  connection  of  the  nodes  in  essence  repeats  the  process  of 
the  sequential  construction  of  the  formula  representing  the  func¬ 
tion  f (xn ,  x~.  ...,  x  )  with  the  aid  of  the  operations  which  are  roal- 
lzed  by  the  logic  elements  which  we  have  used.  The  synthesis  of  the  ar 
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bitrary  boolean  (m, 1) -terminal  network  is  possible  if  the  set  of  indica¬ 
ted  operations  is  strongly  complete. Since  every  (m, 1) -terminal  network 
can  be  made  up  of  n  Individual  (m,  1) -terminal  networks,  then  in  the  case 
of  satisfaction  of  the  condition  of  strong  completeness  we  obtain  the 
possibility  of  constructing  arbitrary  binary  combination  circuits. 

In  practice,  however,  it  is  found  as  a  rule  that  it  is  not  diffi¬ 
cult  to  apply  to  the  synthesized  circuit  signals  which  are  identically 
(at  all  instants  of  time)  equal  to  zero  and  one  using  channels  special¬ 
ly  assigned  for  this  purpose.  Moreover,  for  the  zero  signal  we  fre¬ 
quently  do  not  need  any  special  channel,  since  with  several  physical 
realizations  of  the  signals  a  zero  signal  appears  on  each  isolated, 
l.e. ,  not  connected  to  anywhere,  input  pole.  In  this  case  the  condi¬ 
tion  for  the  possibility  of  the  synthesis  of  an  arbitrary  binary  combi¬ 
nation  circuit  is  now  not  strong,  but  rather  ordinary  completeness  of 
the  set  of  operations  which  are  realized  by  the  selected  logic  ele¬ 
ments.  In  this  case,  for  brevity  we  speak  of  the  completeness  or  incom¬ 
pleteness  of  the  set  of  logic  elements  themselves,  rather  than  the 
boolean  operations  realized  by  them. 

Among  the  logic  elements  which  are  most  frequently  used  in  prac¬ 
tice  there  are  the  so-called  AND  and  OR  elements  which  realize  respec¬ 
tively  the  boolean  operations  of  multiplication  and  disjunction.  As  a 
rule,  along  with  the  two -input  AND  and  OR  elements  which  realize  the 
functions  xy  and  x\/y,  wide  use  is  made  of  the  multi  -input  variants  of 
these  elements  which  realize  the  boolean  functions  x^  Xg  . . .  xn  and 
*iV*sV  —  V*«- 

The  boolean  (l,l) -terminal  network  which  performs  the  negation  op¬ 
eration  also  frequently  figures  among  the  logic  elements  under  the 
name  of  Invertor.  In  the  realization  of  che  signals  0  and  1  in  the  so- 


called  potential  circuits  using  two  different  levels  of  electrical  po- 
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tential,  the  AND  and  OR  circuits  can  be  constructed  with  the  aid  of  re¬ 
sistors  and  semiconductor  diodes,  and  the  inverter  with  the  use  of  re¬ 
sistors  and  semiconductor  triodes  (transistors). 

When  we  use  two-input  AND/OR/lNVERT  elements  as  the  set  of  logic 
elements,  the  problem  of  the  synthesis  of  the  boolean  (m,  l) -terminal 
networks  reduces  to  the  problem  of  the  construction  of  the  formulas  of 
boolean  algebra  which  represent  the  output  functions  of  these  (m,  l)- 
terminal  networks.  The  interest  is  not  in  the  construction  of  some  cir¬ 
cuit  with  the  given  output  function  (in  view  of  what  we  have  just  said 
this  is  not  difficult),  but  rather  the  construction  of  an  adequately 
economical  system  which  uses  the  smallest  possible  number  of  logic  ele¬ 
ments.  In  this  case  the  problem  of  the  construction  of  economical  cir¬ 
cuits  reduces  to  the  problem  of  the  minimization  of  the  formulas  of 
boolean  algebra. 

Quite  frequently  in  practice,  in  the  construction  of  a  particular 
combination  circuit  we  have  the  possibility  of  applying  to  its  input 
poles  not  only  the  input  signals  of  interest  to  us  x^,  x2,  ...,  x  , 
but  also  their  negations  x^,  x2,  ...,  xn.  In  this  case  it  is  clearly 
sufficient  for  the  synthesis  of  the  circuit  to  have  only  AND  and  OR 
elements,  and  the  problem  of  construction  of  sufficiently  economical 
circuits  is  usually  solved  only  in  the  class  of  the  so-called  two- 
stage  circuits,  i.e.,  circuits  in  which  all  AND  elements  precede  the 
OR  elements  or,  conversely,  all  OR  elements  precede  all  AND  elements. 
Such  circuits  are  obviously  described  by  disjunctive  or  conjunctive 
normal  forms,  for  which  the  minimization  methods  were  discussed  in  §2 
of  the  present  chapter. 

As  an  example  of  the  synthesis  of  the  two-stage  combination  cir¬ 
cuit  let  us  consider  the  synthesis  of  the  boolean  (6, 1 ) -terminal  net¬ 
work  with  the  output  function  f  =  xyzvxyvxyzvyzvzz .  assuming  1  hui  in  <  hr 
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six  input  poles  of  our  circuit  there  are  applied  the  signals  x,  y,  z, 
x,  y,  z,  and  as  logic  elements  we  select  the  two-input  AND  and  OR  ele¬ 
ments. 

If  we  design  the  circuit  in  strict  accordance  with  the  originally 
given  formula  representing  the  function  f ,  then  the  circuit  will  con¬ 
tain  7  AND  and  4  OR  elements.  If,  however,  we  minimize  this  formula  us¬ 
ing  the  Blake  method  as  was  done  (precisely  for 
this  formula)  at  the  end  of  §2,  then  it  is 
found  that  the  given  function  f  can  be  repre¬ 
sented  by  a  far  simpler  formula:  f=yz\/x2.  The 
circuit  corresponding  to  this  formula  contains 
in  all  two  AND  and  one  OR  element.  Representing 
AND  and  OR  elements  with  circles  having  the  letters  C  and  P  Inside,  we 
can  represent  the  constructed  circuit  visually  (Fig.  5)*  In  the  con¬ 
structed  circuit  the  input  poles  to  which  the  signals  x  and  y  are  ap¬ 
plied  are  actually  not  used.  Therefore  the  given  output  function  can 
be  realized  by  a  boolean  (4,1)  -terminal  network  rather  than  by  the 
(6, l) -terminal  network  assumed  initially. 

In  this  example  the  output  function  of  the  circuit  to  be  synthe¬ 
sized  was  given  in  the  form  of  some  formula  of  boolean  algebra  so  that 
the  synthesis  process  reduced  in  essence  only  to  the  simplification  of 
this  formula.  In  practice  we  encounter  most  frequently  the  case  when 
the  output  functions  of  the  circuit  to  be  synthesized  are  given  by  ta¬ 
bles  of  their  values.  In  this  case  the  first  stage  of  the  process  of 
ciicult  synthesis  is  the  finding  of  some  (not  necessarily  the  most  sim¬ 
ple)  formulas  which  represent  the  given  functions.  A  universal  tech¬ 
nique  for  such  construction  is  the  method  based  on  the  use  of  the 
ideal  disjunctive  normal  forms  (see  §2):  any  boolean  function  can  be 
represented  in  the  form  of  the  disjunction  of  the  constituents  of  uni- 
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ty  cc  ending  to  those  sets  of  values  of  the  variables  on  which 
this  .<.  tion  becomes  unity.  The  representation  obtained  is  then  sub¬ 
jected  to  minimization. 

In  the  case  when  the  number  of  variables  is  relatively  small,  it 
is  convenient  to  search  for  the  Irreducible  and  even  the  minimal  dis¬ 
junctive  normal  forms  representing  the  given  functions  directly  from 
the  tables  of  the  values  of  these  functions.  To  facilitate  this  search 
use  is  made  of  special  forms  of  writing  of  these  tables  in  the  form  of 
the  so-called  Karnaugh  maps  (Veitch  diagrams). 

The  Karnaugh  map  is  a  table  with  four  rows  designated  by  the  var¬ 
ious  sets  of  values  of  the  first  two  variables  x  and  ^  and  with  4  col¬ 
umns  designated  by  the  various  sets  of  values  of  the  last  two  varibles 
z,  u.  The  map  field  (for  the  case  of  four  variables)  is  thus  divided 
into  16  squares  which  ar  numbered  sequentially  by  numbers  from  0  to  15 
inclusive.  The  Karnaugh  map  for  four  variables: 

In  using  this  map  to  specify  a  particular 
boolean  function  f(x,  y,  z,  u),  in  each  square 
there  is  written  the  value  of  this  function 
(0  and  l)  on  that  set  of  values  of  the  variables 
whose  number  coincides  with  the  number  of  the 
given  square  (the  first  two  elements  of  the  set 
under  discussion  here  designate  the  row  and  the 
second  two  elements  the  column,  at  the  intersection  of  which  the  square 
in  question  is  located). 

With  this  formulation  in  the  case  of  the  partial  boolean  map:  de¬ 
signated  with  zero,  designated  with  unity,  and  not  designated  at  all. 

The  last  squares  correspond  to  those  sets  on  which  the  values  of  the 
function  in  question  are  not  defined.  In  the  case  of  the  everywhere- 
given  boolean  functions,  in  all  the  squares  there  will  be  written  either 
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zero  or  unity,  therefore  these  functions  can  be  specified  by  the  in¬ 
dication  only  of  those  squares  in  which  there  will  be  written  ones, 
or,  as  we  shall  say,  by  indicating  the  ones  configuration  of  the  func¬ 
tion  in  question. 

The  Karnaugh  map  is  constructed  so  that  the  ones  configurations 
which  give  the  various  elementary  products  are  recognized  very  simply. 
For  the  case  considered  of  the  four  variables,  the  ones  configurations 
of  the  elementary  products  of  length  4  (constituents  of  unity)  reduce 
to  separate,  or  as  we  shall  say  here,  to  elementary  Karnaugh  maps. 

■3  3 

The  corresponding  configurations  which  give  all  C^2-J  =  32  ele¬ 
mentary  products  of  length  3  are  all  possible  pairs  of  elementary  small 
squares  standing  in  a  row  and  thus  composing  a  rectangle  with  dimensions 
2x1.  It  is  only  necessary  to  mentally  identify  the  opposite  edges  of 
the  Karnaugh  map  —  the  upper  with  the  lower  and  the  left  with  the  right. 
As  the  result  of  this  identification  it  is  necessary  to  consider,  for 
example,  that  the  elementary  small  squares  with  numbers  4  and  6  or  with 
numbers  0  and  8  stand  in  a  line,  while  the  elementary  small  squares 
5  and  9  or  7  and  2  must  not  be  considered  as  standing  in  a  line. 

Similarly  the  ones  configurations  which  give  all  C^*2  =  24  ele¬ 
mentary  products  of  length  2  are  all  possible  combinations  of  four 
elementary  small  squares  forming  (4  x  l)  -  rectangles  and  (2x2)- 
squares,  and  for  all  C^»2  =  8  elementary  products  of  length  1  the  cor¬ 
responding  representations  are  given  by  all  possible  combinations  of 
elementary  small  squares  in  (4  x  2)  -  rectangles.  Here  we  must  not  for¬ 
get  the  identification  of  the  opposite  edges  of  the  Karnaugh  map. 

The  elementary  product  corresponding  to  any  of  the  ones  configura¬ 
tions  listed  above  is  easily  found,  since  such  a  product  is  composed 
of  all  three  and  only  the  three  cofactors  (x,  x,  y,  y,  z,  s  u,  u), 
which  become  unity  on  all  the  sets  of  values  of  the  variables  covered 
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by  the  given  configuration. 

Using  this  rule  it  is  easy  to  find,  for  example,  that  to  the  con¬ 
figuration  consisting  of  the  elementary  product  xyz  and  to  the  config¬ 
uration  ((2  x  2)  -square.’)  consisting  of  the  elementary  squares  0,  2, 

8,  10  there  corresponds  the  elementary  product  yu. 

When  the  boolean  function  f  is  given  by  the  Karnaugh  map,  finding 
the  irreducible  and  minimal  disjunctive  normal  forms  which  represent 
this  function  reduces  to  finding  the  most  economical  coverings  of  the 
ones  configuration  which  gives  the  function  f  using  the  ones  config¬ 
urations  described  above  which  correspond  to  the  elementary  products 
of  different  length  (see  §2). 

Let  us  consider  as  an  example  the  problem  of  finding  such  a  mini¬ 
mal  covering  for  the  boolean  function  f  given  by  the  Karnaugh  nap 

It  is  assumed  that  in  the  squares  in  which 
there  are  dashes  the  values  of  the  function  f  can 
be  arbitrary,  so  that  if  in  the  formation  of  a 
particular  desired  configuration  it  is  necessary 
to  place  a  one  in  a  particular  one  of  these 
squares  this  can  always  be  done. 

It  is  easy  to  see  that  all  the  (4  x  2)  - 
rectangles  which  can  be  constructed  on  the  given  map  include  at  least 
one  zero  of  the  function  f.  This  means  that  among  the  elementary  pro¬ 
ducts  of  length  1  there  is  no  implicant  of  the  function  in  question. 
There  are  two  (2  x  2)-squares  which  do  not  contain  zeros  of  the  func¬ 
tion  f :  the  "square"  consisting  of  the  four  corner  elements  of  the 
elementary  squares  and  the  "square"  standing  in  the  right  lower  corner 
of  the  map  (it  contains  three  ones  and  one  dash).  Together,  these 
squares  cover  all  the  ones  of  the  function  f  and  therefore  this  func¬ 
tion  (with  an  accuracy  to  the  indifferent  values  designated  by  the 
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dashes)  can  be  represented  in  the  form  of  the  disjunction  of  the  corre¬ 
sponding  elementary  products  /  -  y  u"v  xz  . 

The  disjunctive  normal  form  found  is,  as  it  is  easy  to  see,  mini¬ 
mal  for  the  function  f  with  any  possible  interpretations  of  the  indif¬ 
ferent  values  designated  by  the  dashes. 

The  described  technique  for  finding  directly  the  minimal  disjunc¬ 
tive  forms  is  applicable  not  only  for  the  boolean  functions  of  four 
variables,  but  also  for  functions  of  a  smaller  number  of  variables. 

The  Karnaugh  maps  of  general  form  for  functions  of  three  and  two  vari¬ 
ables  : 

In  using  the  first  table  it  is  neces¬ 
sary  to  mentally  identify  the  upper  and 
lower  edges,  so  that  the  elementary  squares 
with  numbers  0;  4  and  1;  5  are  to  be  con¬ 
sidered  neighboring. 

With  the  aid  of  certain  additional 
tricks  we  can  construct  Karnaugh  maps  for 
5  and  6  variables.  For  a  larger  number  of 
variables  in  the  general  case,  the  problem  of  finding  the  minimal  re¬ 
presentations  directly  from  the  tables  of  the  boolean  functions  be¬ 
comes  so  cumbersome  that  the  corresponding  Karnaugh  maps  are  of  little 
assistance.  In  these  cases  we  must  resort  to  the  analytic  methods  for 
minimization  of  the  formulas  of  the  type  of  the  Blake  method  and  other 
similar  methods. 

The  finding  of  the  minimal  disjunctive  normal  forms  for  the  out¬ 
put  functions  of  the  boolean  multi-terminal  networks  is  extremely  use¬ 
ful,  not  only  for  the  synthesis  of  the  two-stage  circuits  using  AND 
and  OR  elements  described  above,  but  also  for  the  synthesis  of  circuits 
using  gate  elements,  usually  termed  simply  gates. 
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Gate  is  the  name  given  to  the  binary  combination  circuit  with 


two  input  poles  and  one  output  pole.  The  operation  of  the  gate  amounts 
to  the  fact  that  it  passes  or  does  not  pass  to  its  output  pole  a  sig¬ 
nal  applied  at  one  of  its  input  poles  (termed  gated  pole)  depending  on 
whether  there  is  applied  to  the  second  of  the  input  poles  (termed  the 
control  pole)  a  signal  equal  to  one  or,  correspondingly,  a  signal  equal 
to  zero. 

In  the  gate  circuits  (i.e.,  in  circuits  composed  of  gates)  sig¬ 
nals  are  applied  to  the  control  input  poles  of  all  input  poles  of  all 
the  gates  which  are  equal  to  some  initial  variables  x,  y,  z,  ...  and 
their  negations  x,  y,  z,  ...  .  In  addition,  there  is  still  another  in¬ 
put  pole  of  the  circuit  to  which  there  is  applied  the  gating  input 
signal,  identically  equal  to  one.  For  the  gating  signals  the  property 
of  natural  separation  (see  above)  is  satisfied,  which  ensures  with  the 
application  of  several  gating  signals  to  the  same  pole  taht  the  sig¬ 
nal  appearing  on  this  pole  will  be  equal  to  the  disjunction  of  all 
these  signals.  The  output  signals  of  the  gating  circuits  are  also  sig¬ 
nals  of  the  gating  type. 

The  gating  circuit  for  the  case  of  a  single  output  pole  can  be 
completely  constructed  using  any  formula  which  represents  the  output 
function  of  the  circuit  with  the  aid  of  the  operations  of  multiplica¬ 
tion  and  disjunction  applied  to  the  input  variables  and  their  negations 
(an  example  of  such  a  formula  might  be  any  disjunctive  normal  form). 
With  this  construction,  to  every  multiplication  there  corresponds  a  se¬ 
ries  connection,  and  to  every  disjunction  there  corresponds  a  parallel 
connection  of  gates  or  gate  circuits  composed  of  several  gates. 

If  we  designated  a  gate  with  a  circle  with  the  letter  B  inside, 
then  a  gate  circuit  composed  in  accordance  with  the  formula  f  ~(x\/~u)z  s~xy, 
will  have  the  form  shown  in  Fig.  6.  At  the  internal  node  of  the  cir- 
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cult  designated  by  the  letter  A  there  is  generated  the  gating  signal 
(result  of  the  parallel  connection  of  the  gates  with  the  con¬ 
trol  signals  x  and  y).  At  the  node  B  there  is  generated  the  gating 
signal  x  and  at  the  node  C  the  gating  signal  xy  (result  of  series  con¬ 
nection  of  gates  with  the  control  signals  x  and  ^).  Finally,  the  out¬ 
put  signal  fo  the  entire  circuit  as  a  whole  (at  pole  D)  is  the  result 
of  the  parallel  connection  of  two  gate  networks  with  the  output  (gated) 

The  gate  circuits  include  the  so- 
called  relay-contact  circuits  which  are 
constructed  using  electromagnetic  relays. 
Gates  of  this  type  (relay  contact)  have 
two-way  conductivity,  transmitting  the 
gated  signals  not  only  in  the  forward  di¬ 
rection  (from  the  gate  input  pole  to  the  output  pole)  but  also  in  the 
opposite  direction.  This  situation  gives  rise  to  additional  diffi¬ 
culties  in  the  construction  of  the  theory  of  the  relay-contact  circuits 

(associated  with  the  existence  of  the  so-called 
bridge  circuits  and  the  appearance  of  paths  for  sig- 
/  nal  transmission  which  were  not  Initially  planned). 
Such  difficulties  do  not  usually  arise  in  the  case 
of  the  electronic  gates  which  do  not  have  two-way 

Pig.  7. 

conductivity. 

In  the  design  of  gate  circuits  using  gates  of  all  types  the  so- 
called  cascade  method  (see  [65])  can  be  of  considerable  assistance. 

This  method  is  based  on  the  use  of  the  relation,  valid  for  any  boolean 
function  f, 


signals  (xv~y)z  and  xy. 


/  (*»•  **•  •  •  • »  |»  *„)  3  f  (*>•  **••••»  — | »  1 V  /  (*»>  **• 

•  •  •  •  ^n— I' 


(34) 
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The  validity  of  this  formula  is  easy  to  see  by  setting  xn  = 
xn  =  0  in  it. 

In  application  to  the  gate  circuits,  and  also  to  the  circuits  con¬ 
structed  using  the  AND  and  OR  elements,  formula  (34)  reduces  the  pro¬ 
blem  of  the  synthesis  of  the  circuit  with  the  n-place  out  function 
f(x^,  Xg,  . ..,  xn)  to  the  problem  of  the  synthesis  of  the  circuit  with 

two  (n  -  l)-place  output  functions  /,(*„  xt xn-i )=/(*„*„  *«-i,  1)  and 

/«(*i.  xt, ...»  x„-i )  =/(*„  x„  ....  xn-i,  0)  • 

The  cascade  of  gate  circuits  realizing  this  reduction  is  shown  in 
Fig.  7.  With  several  output  functions  the  circuit  of  our  (n-th)  cas¬ 
cade  becomes  complicated,  however  the  reduction  process  It  self  re¬ 
mains  essentially  the  same.  Continuing  the  reduction  process,  we  final¬ 
ly  construct  the  required  gate  circuit,  composed  in  the  general  case 
(for  n  variables)  of  n  cascades. 

The  application  of  a  synthesis  method  analogous  to  the  described 
cascade  method  permitted  Shannon  [80]  to  establish  the  following  esti¬ 
mate  of  the  number  of  gates  (of  any  type)  required  for  the  realization 
(in  the  form  of  the  output  function  of  some  gate  circuit)  of  the  ar¬ 
bitrary  boolean  function  of  n  arguments. 

Theorem  1.  For  any  real  positive  number  e  there  exists  the  whole 
number  N  =  N(e)  such  that  any  boolean  function  of  n  >  N  varialbes  can 
be  realized  in  the  form  of  the  output  function  of  a  gate  circuit  con¬ 
taining  no  more  than  gates.  For  a  similar  realization  with 

any  n  no  more  than  gates  are  required. 

—  n 

Similar  complexity  estimates  of  circuits,  but  for  general  assump¬ 
tions  relative  to  the  sets  of  logic  elements  used,  were  established 
by  Lupanov  (see  [51],  for  example).  It  has  also  been  shown  that  there 
exist  boolean  functions  which  cannot  be  realized  by  less  than 
(1— e„)  gates,  where  the  quantity  en  in  this  case  tends  to  zero 
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with  unlimited  increase  of  n. 

It  is  of  interest  to  generalize  the  results  presented  to  the  case 
of  the  arbitrary  boolean  (m,n)-teminal  networks  constructed  with  the 
use  of  any  two- input  logic  elements.  Since  every  boolean  (m,  n)-  term! 
nal  network  realizes  some  alphabetic  operator  A  with  a  finite  domain 
of  definition,  the  minimal  possible  complexity  L(A)  of  the  boolean 
(m,  n)-teiminal  networks  realizing  the  given  operator  A  can  be  taken 
as  the  natural  quantitative  complexity  estimate  of  the  operator  A  it¬ 
self.  Here,  in  view  of  the  absence  of  adequately  substantiated  reasons 
to  give  preference  to  a  particular  two- input  logic  element,  it  is 
clearly  most  natural  in  the  construction  of  the  indicated  boolean  (m, 
n)-teminal  networks  to  make  use  of  all  the  types  of  two- input  logic 
elements,  considering  the  circuit  complexity  to  be  the  total  number  of 
logic  elements  composing  it. 

The  described  method  is  not  directly  suitable  for  the  estimation 
of  the  complexity  of  alphabetic  operators  with  infinite  domains  of 
definition.  If,  however,  we  are  required  to  obtain  not  the  absolute 
estimate,  but  only  a  relative  practical  estimate  of  the  complexity  of 
several  alphabetic  operators,  we  can  first  finitize  (make  finite) 
their  domains  of  definition,  discarding  all  the  input  words  whose  1 
lengths  exceed  some  number  N.  This  number  must  be  selected  so  that  the 
probability  of  encountering  in  the  practical  application  of  the  ope¬ 
rators  in  question  input  words  longer  than  N  will  be  sufficiently 
small. 

If  for  all  n  =  1,2,...  there  are  given  the  probabilities  p(n)  of 
the  occurrence  of  input  words  of  length  n,  then  we  can  also  proceed  as 
follows:  the  given  alphabetic  operator  A  is  divided  into  the  operators 
A^,  A2,...  so  that  the  operator  An  has  as  its  domain  of  defination  the 
set  of  all  words  from  the  domain  of  definition  of  the  operator  A  whose 
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length  Is  equal  to  n  and  It  acts  on  these  words  Just  like  the  operator 
A(n  =  1,  2,...)*  Let  L(n)  be  the  complexity  of  the  operator  An  com¬ 
puted  by  the  method  described  above.  Then  the  complexity  of  the  orig- 

m 

inal  alphabetic  operator  is  quite  naturally  the  infinite  sum  £L(n)e(«)  . 

flvl 

More  rational  estimates  of  the  alphabetic  operators  can  be  ob¬ 
tained  by  using  discrete  automata  with  memory  for  the  representation 
of  the  operators  in  place  of  the  combination  circuits.  The  fundamen¬ 
tals  of  the  theory  of  such  automata  are  considered  in  the  following 
(third)  chapter  of  the  present  book. 

§5-  THE  CONCEPT  OP  PROPOSITIONAL  CALCULUS 

Propositional  calculus  is  the  initial  and  simplest  portion  of 
mathematical  logic.  The  primary  problem  which  mathematical  logic  poses 
to  itself  is  the  formalization  of  the  complex  thought  processes  which 
go  to  make  up  so-called  logical  thought.  This  formalization  is  achieved 
by  use  of  the  construction  of  logical  calculus. 

Every  logical  calculus  includes  in  itself  first  of  all  some  means 
for  the  formalization  of  the  writing  of  various  sorts  of  statements 
about  which  there  is  reason  to  say  that  they  are  true  of  false.  It  is 
customary  in  mathematical  logic  to  call  this  sort  of  statement  a  pro¬ 
position.  The  formalization,  which  is  what  we  are  considering  here, 
amounts  to  the  introduction  of  a  rigorously  defined  system  of  symbols 
for  the  designation  of  various  sorts  of  operations  which  make  it  pos¬ 
sible  to  construct  more  complex  propositions  from  simpler  propositions. 

As  a  result  of  the  formalization,  we  have  the  possibility  of  writing 
propositions  in  the  form  of  formulas  constructed  from  the  symbols  in¬ 
troduced  by  the  use  of  definite  rules. 

In  spite  of  the  great  importance  of  the  formalization  of  the  writ¬ 
ing  of  the  propositions,  formalization  in  itself  does  not  constitute 
the  calculus.  For  the  construction  of  a  particular  logical  calculus  it 
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is  necessary  also  to  define  certain  formulas  and  operations  on  for¬ 
mulas,  termed  axioms  and  derivation  rules  of  the  corresponding  calcu¬ 
lus,  which  will  make  it  possible  to  derive  formally  all  possible  logi¬ 
cal  corollaries  from  any  given  system  of  statements  and  will  make  it 
possible  to  characterize  formally  all  the  so-called  Identically  true 
propositions  (formulas)  of  the  calculus  in  question. 

In  order  to  understand  what  identically  true -propositions  are,  let 
us  consider  some  examples.  Propositions  of  the  type  "oxygen  is  a  gas" 
or  "two  times  two  is  eleven"  are  examples  of  the  so-called  elementary 
constant  propositions.  The  elementary  nature  of  these  propositions 
consists  in  the  fact  that  they  cannot  be  divided  into  simpler  com¬ 
ponent  parts  which  themselves  would  be  propositions.  Actually,  the  ex¬ 
pressions  "is  a  gas"  or  "two  times  two"  are  not  complete  propositions 
since  the  question  of  truth  or  falsity  has  not  meaning  relative  to 
them.  The  term  "constant"  in  application  to  the  propositions  presented 
is  to  emphasize  that  we  are  considering  completely  defined  proposi¬ 
tions  relating  to  completely  defined  areas  of  knowledge. 

We  note  that  the  truth  or  falsity  of  these  propositions  depends 
on  the  conditions  in  which  they  are  considered  and  is  established,  as 
a  rule,  outside  the  limits  of  mathematical  logic.  In  application  to 
the  first  proposition  this  concept  is  obvious  (oxygen  under  certain 
conditions  can  be  not  only  a  gas  but  also  a  liquid  or  even  a  solid). 

The  second  proposition,  however,  at  first  glance  seems  obviously  false. 
Actually,  though,  all  we  have  to  do  Is  to  assume  that  in  place  of  the 
decimal  system  of  numbers,  we  are  using  the  ternary  system  under  the 
condition  of  retaining  the  names  of  multiplace  numbers  with  which  we 
are  familiar,  and  the  proposition  "two  times  two  is  eleven"  (2x2= 

=  3*1  +  1  =  H)  changes  from  false  to  true. 

Therefore,  in  the  applications  of  mathematical  logic  it  will  be 
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necessary  to  specify  the  conditions  under  which  a  particular  constant 
proposition  is  made  so  accurately  and  definitely  that  the  truth  value 
of  this  proposition  cannot  undergo  changes  in  the  process  of  obtaining 
various  sorts  of  conclusions  and  corollaries  from  the  proposition  with¬ 
in  the  framework  of  the  logical  calculus  being  used.  Thus,  any  constant 
proposition  must  be  considered  to  be  true  all  the  time  or  false  all  the 
time  through  the  entire  duration  of  a  particular  logical  derivation. 

In  propositional  calculus  we  are  not  interested  in  the  internal 
structure  of  the  elementary  propositions,  considering  them  as  whole 
units.  Therefor^  for  their  designation  it  is  natural  to  make  use  of 
the  individual  letters  of  some  alphabet  (usually  Latin).  Individual 
letters  can  also  be  used  to  designate  the  so-called  variable  proposi¬ 
tions.  The  term  "variable  proposition"  in  application  to  a  particular 
symbol  means  that  in  place  of  this  symbol  there  can  always  be  substi¬ 
tuted  any  specific  constant  proposition,  either  true  or  false. 

Propositions,  both  constant  and  variable,  can  be  combined  into 
complex  propositions  by  using  as  the  connective  the  words  "and",  "or", 
"if  -  then",  "not",  etc.  If  variable  propositions  occur  in  the  composi¬ 
tion  of  the  complex  propositions,  then  with  replacement  of  them  by 
certain  propositions  the  complex  proposition  may  be  true,  and  with  re¬ 
placement  by  others  it  may  be  false.  For  example,  the  complex  proposi¬ 
tion  "A  and  B"  where  A  and  B  are  variable  propositions  will  obviously 
be  true  in  the  case  and  only  in  the  case  when  both  propositions  A  and 
B  are  true. 

However,  there  do  exist  complex  propostions  containing  in  their 
composition  variable  propositions  which  remain  true  for  any  values 
which  can  be  given  to  the  variable  propositions  mentioned.  For  ex¬ 
ample,  the  complex  proposition  "if  it  is  incorrect  that  the  proposi¬ 
tion  A  is  false,  then  the  proposition  A  is  true"  remains  true  no  rnat- 
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ter  what  proposition  is  substituted  in  place  of  the  variable  proposi¬ 
tion  A.  Such  propositions  are  customarily  called  identically  true  pro¬ 
positions.  The  problem  of  the  separation  of  the  identically  true  pro¬ 
positions  in  the  set  of  all  possible  propositions  is  a  most  important 
task  of  any  logical  calculus. 

After  all  our  preliminary  remarks  we  turn  to  the  construction  of 
the  propositional  calculus  itself. 

Propositional  calculus  is  constructed  from  formal  objects  of  three 
types.  The  objects  of  the  first  type  are  the  variable  and  constant 
propositions  which  are  not  separable  into  individual  component  parts. 
For  their  designation  we  shall  make  use  of  the  capital  Latin  letters 
(with  or  without  subscripts),  calling  them  propositional  letters.  The 
objects  of  the  second  type  are  the  propositional  connectives  -  the  for¬ 
mal  equivalents  of  the  connective  words  presented  above  ’’not",  "and", 
"if  -  then".  For  their  designation  we  shall  make  use  of  the  correspond¬ 
ing  symbols  of  negation  (”|  ),  disjunction  (v),  conjuctlon  (a)  and  lm- 
plication  O).  We  note  that  in  the  reading  of  the  formulas  it  is  more 
convenient  to  replace  the  implication  symbol  by  the  word  "implies"  and 
not  by  the  words  "if  -  then".  The  objects  of  the  third  type  are  the 
parentheses,  which  serve  for  expressing  the  order  in  which  the  pro- 
positional  connective  which  we  have  listed  are  to  operate. 

Similarly  to  the  way  in  which  the  formulas  of  boolean  algebra 
were  constructed  in  the  beginning  of  §2,  the  formulas  of  propositional 
calculus  are  constructed  from  the  formal  objects  which  we  have  intro¬ 
duced.  The  difference  lies  in  the  use  of  the  additional  symbol  3  (for¬ 
mal  analogy  of  the  boolean  implication  operation),  and  als.)  in  the  re¬ 
placement  of  the  dot  in  the  designation  of  the  conjunction  (multiplica¬ 
tion)  by  the  symbol  A,  and  in  the  use  of  the  symbol  “|  standing  before 
the  negated  expression  in  place  of  the  bar  over  the  negated  symbol  as 
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the  sign  of  negation. 

The  formulas  of  propositional  calculus  are  the  individusl  letters 
and  also  all  the  expressions  constructed  recurrently  from  the  already 
constructed  formulas  A  and  B  using  the  rules*-!  (51),  («)  A  v  (»),  d  ($8). 

Just  as  in  the  case  of  the  formulas  of  boolean  algebra,  in  order  to 
simplify  the  writing,  a  part  of  the  parentheses  can  be  dropped  if  this 
does  not  cause  any  ambiguity  in  the  order  of  application  of  the  pro- 
positional  connectives.  It  is  assumed  that  in  the  absence  of  parenthe¬ 
ses  is  determined  by  the  sequency  ”|,  a  ,  v  ,  3  and  for  like  paren¬ 
theses  the  order  is  that  of  their  appearance  in  the  formula,  read  from 
left  to  right.  In  several  cases  an  additional  symbol  is  introduced  in 
propositional  calculus  =  (or  read  as  abbreviated  notation  in  the 
form  (A)  =  (b)  of  the  expression  ((9i)  d  (S))  A  ((©)  3  (90).  in 

using  this  symbol  it  is  assumed  that  it  occupies  the  last  place  in  the 
sequence  of  symbols  we  have  just  written  out  (after  the  symbol  3). 

As  an  illustration  of  the  method  used  for  the  reduction  of  the 

* 

number  of  parentheses,  we  note  that  the  formula  ^  A  \j  B  a  CDfiVC 
is  understood  as  ((HA)  V  (B  A  C))  3(fl  V  O,  and  not  in  any  other  way,  '  ne 
formula  A  =  B  3C  must  be  understood  as  (A)  =  ((B)  3  (C))  and  not  as 
((A)  5  (B))  3  (C),  etc- 

The  definitions  introduced  above  resolve  only  the  first  part  of 
the  problem  of  the  construction  of  the  propositional  calculus  -  the 
problem  of  the  formalization  of  the  writing  of  the  complex  proposi¬ 
tions.  The  second  part  of  this  problem  -  finding  the  method  for  the 
determination  of  the  identically  true  propositions  -  can  be  resolved 
in  two  ways:  the  contensive  and  formal  approaches. 

In  the  contensive  approach,  which  Is  easier  to  understand,  we 
cannot  for  a  moment  forget  about  the  contensive  meaning  of  the  con¬ 
cepts  of  the  propositional  letters  and  the  propositional  connectives. 
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In  this  case  use  is  made  to  the  maximal  possible  degree  of  the  basic 
concept  of  the  contensive  meaning.  Thus,  if  the  propositional  letter 
A  designates  a  particular  constant  proportion,  then  there  is  no  need 
to  remember  this  proposition  itself,  it  is  necessary  only  to  know  the 
value  of  the  so-called  truth  function  of  this  proposition:  "true"  if 
the  proposition  A  is  true,  and  "false"  if  the  proposition  A  is  false. 

The  truth  function  of  any  propositional  letter  denoting  a  vari¬ 
able  proposition  is  identified  with  this  letter  itself,  considering 
it  as  a  boolean  variable.  Thus,  the  contensive  meaning  of  the  proposi¬ 
tional  letters  in  our  construction  is  exhausted  by  their  capability 
of  taking  two  values:  "true"  and  "false." 

We  shall  associate  the  contensive  value  of  the  propositional 
connectives  only  with  the  truth  functions  of  the  complex  propositions 
constructed  with  their  use.  Every  formula  A  of  propositional  calculus 
can  be  interpreted  as  a  formula  in  boolean  algebra  with  inclusion  in 
it  of  the  additional  operation  of  implication.  The  constant  proposi¬ 
tions  appearing  in  the  formula  A  must  be  replaced  by  the  correspond- 
ing  boolean  constants  (values  of  their  truth  functions).  The  symbols 
corresponding  to  the  variable  propositions  are  considered  as  arguments 
of  the  boolean  function  represented  by  the  formula  A.  This  function 
is  then  termed  the  truth  function  of  the  complex  propositions  ex¬ 
pressed  by  the  formula  A. 

On  the  contensive  level  of  the  construction  of  propositional  cal¬ 
culus,  those  and  only  those  formulas  of  this  calculus  (complex  pro¬ 
positions)  whose  truth  functions  take  the  value  "true"  for  all  values 
of  the  variables  are  considered  to  be  identically  true. 

We  recall  that  as  a  result  of  the  agreement  made  in  §1,  the 
value  "true"  corresponds  to  one  and  the  value  "false"  corresponds  to 
zero.  Using  the  value  tables  presented  in  §1  for  the  conduction  xa y 
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(table  expressed  by  the  cortege  (0001)),  for  the  disjunction  xvY 
(cortege  (Olll)),  for  implication  x^)y  (cortege  (1101)),  and  recalling 
that  negation  transforms  1  into  0,  and  0  into  1,  it  is  easy  to  find 
the  value  table  of  the  truth  function  for  any  formula  of  propositional 
calculus.  This  table  is  usually  termed  the  truth  table  of  the  formula 
in  question  (or  of  the  complex  proposition  defined  by  it).  In  fill¬ 
ing  in  the  table  we  use  the  abbreviated  designations:  T  for  true  and 
F  for  false.  As  an  example  we  present  the  truth  table  for  the  formula 
A  )B  which  defines  the  contensive  meaning  of  implication  (considered 
as  a  propositional  connective): 

From  this  table  we  see  that  the  meaning  of 
the  term  "implies"  (corresponding  to  the  proposi¬ 
tional  connective  3)  in  propositional  calculus  is 
somewhat  different  than  in  ordinary  speech.  Actu¬ 
ally,  usually  when  we  say  that  some  proposition 
A  implies  another  proposition  B  we  have  in  mind 
that  the  propositions  A  and  B  are  casually  re¬ 
lated  with  one  another.  Thus,  the  complex  proposition  which  states 
that  the  proposition  "this  substance  is  oxygen"  implies  the  proposi¬ 
tion  "this  substance  is  a  gas"  seems  to  us  (with  the  reservation  made 
above  on  the  gaseous  nature  of  oxygen)  both  true  and  reasonable.  At 
the  same  time,  the  complex  propostion  which  states  that  the  proposi¬ 
tion  "it  is  cold  in  the  winter"  implies  the  proposition  "two  times 
two  is  four"  seems  to  us  complete  nonsense.  However,  on  the  strength 
of  the  truth  table  constructed  above  for  the  formula  A  3  B,  in  pro- 
positional  calculus  the  second  of  these  complex  propositions  must  be 
considered  true  to  no  less  a  degree  than  the  first. 

The  reason  for  this  presumption  is  not  difficult  to  understand. 
Actually,  in  limiting  ourselves  by  the  condition  of  considering  the 
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the  elementary  propositions  only  form  the  position  of  whether  they  are 
true  of  false,  we  have  thereby  made  all  the  true  (and  all  the  false) 
elementary  propositions  quite  indistinguishable  from  one  another.  There¬ 
fore,  in  particular,  in  the  definition  of  the  content  embedded  in  the 
connective  3  we  are  forced  to  operate  only  with  the  concepts  of  truth 
and  falsity#  and  in  this  direction  it  is  obviously  not  possible  to 
penetrate  into  the  inner  structure  of  the  elementary  propositions  of 
all  classifications,  which,  or  course,  is  necessary  for  the  establish¬ 
ment  of  causal  connections  between  them. 

Disclosure  of  the  internal  structure  of  the  elementary  propositions 
and  the  associated  increase  of  the  capabilities  for  logical  analysis 
are  achieved  by  means  of  more  complex  logical  calculus,  in  particular 
the  so-called  predicate  calculus  (see  Chapter  6).  As  for  propositional 
calculus,  we  must  content  ourselves  with  the  relative  poverty  of  its 
expressive  capabilities,  accepting  this  as  a  sort  of  payment  for  the 
simplicity  and  clarity  of  this  calculus. 

The  propositional  connective  2)  is  used  later  on  as  an  instrument 
for  obtaining  logical  corollaries  from  particular  formulas  of  proposi¬ 
tional  calculus  and  in  the  other,  higher  logical  calculuses.  Such 
corollaries  must  be  true  for  truth  of  the  original  formulas.  Therefore, 
the  construction  of  the  derivation  must  of  necessity  exclude  the  pos¬ 
sibility  (by  indicating  its  falsity  in  this  case)  of  obtaining  false 
corollaries  with  truth  of  the  original  formulas.  At  the  same  time,  with 
falsity  of  the  original  information  the  obtaining  of  any  corollaries 
(both  true  and  false  as  well)  does  not  indicate,  of  course,  falsity  of 
the  construction  Itself  of  the  derivation.  This  circumstance  finds  its 
concrete  expression  in  the  truth  table  for  the  formula  A  3)  B.  All  we 
have  said  here  will  become  more  understandable  after  acquaintance  with 
the  formal  aspect  of  propositional  calculus. 
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The  contensive  aspect  of  propositional  calculus  which  we  have 
described  makes  it  relatively  easy  to  resolve  the  question  on  the 
identical  truth  of  any  complex  proposition  (given  by  more  formula  of 
the  calculus):  it  is  sufficient  to  sort  over  all  possible  sets  of  the 
truth  values  of  the  variable  propositions  composing  it  and  verify 
whether  on  all  these  sets  the  truth  function  of  the  complex  proposi¬ 
tion  in  question  takes  the  value  "true."  For  example,  the  formula 
A  A  B  3  A  will  be  an  identically  true  formula  of  the  propositional 
calculus  on  the  basis  of  the  following  verification:  if  A  =  ji  and 
B  =JI,  then  the  formula  A/\BDA  reduces  to  jidji  ,  which,  in  view 
of  the  truth  table,  gives  the  value  H  ;  the  same  will  be  the  case  with 
A  =  Jl  and  H  ;  with  A- H  ,  depending  on  the  values  of  fl(ji  orH), 

we  reduce  our  formula  either  to  J13H,  ,  or  to  jow  *  which,  on  the 

basis  of  the  truth  tables,  in  both  cases  leads  to  the  value  h  • 

We  can  use  the  technique  of  transformations  in  boolean  algebra 
for  the  proof  of  the  identical  truth  of  the  formulas  of  propositional 
calculus.  We  need  only  first  replace  all  the  implications  according  to 
the  formulae  3 «)  v®)(  see  §l)  wlth  application  to  the  example  we 

have  just  considered,  we  obtain  the  following  chain  of  transformation; 

A  A  B  D  A  =-  ”T(/IAB)V  A  =(~A\/  ~i B)  \)A  AV  Av~  B  ~  1  V  ~~\B  =  1  This  chain 
proves  the  identical  truth  of  the  formula  we  started  with. 

The  identically  false  propositions  can  be  considered  similarly, 
i.e.,  those  (complex)  propositions  whose  truth  functions  take  the  val¬ 
ue  "false"  for  all  values  of  the  variable  propositions  composing  the 
given  proposition.  It  is  easy  to  understand  that  the  class  of  all 
Identically  false  propositions  coincides  with  the  negations  of  all 
possible  identically  true  propositions. 

In  spite  of  the  simplicity  and  the  clarity,  the  contensive  aspect 
of  propositional  calculus  also  has  several  drawbacks.  First,  the  meth- 
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od  of  proof  of  the  truth  of  the  formulas,  based  on  the  sorting  of  all 
sets  of  values  of  the  arguments,  does  not  permit  direct  transfer  of 
this  method  to  the  more  complex  calculuses  in  which  the  number  of  such 
sets  may  be  infinite.  Second,  the  methods  which  we  have  derived  above 
permit  the  direct  determination  not  of  the  identically  true  proposi¬ 
tions,  but  of  the  propositions  which  are  true  with  particular  addi¬ 
tional  assumptions  (for  example,  the  formula  A  3  B,  which  is  not  iden¬ 
tically  true,  becomes  true  under  the  condition  that  the  formula 
A  “i  B).  is  false).  But  problems  of  this  sort  constantly  arise  in 
the  various  applications  of  logical  calculus.  We  can,  it  is  true,  de¬ 
velop  the  corresponding  methods  within  the  framework  6f  boolean  alge¬ 
bra,  however  in  this  case  still  another  essential  difficulty  is  ag¬ 
gravated  which  is  associated  with  the  contensive  aspect  of  proposi¬ 
tional  calculus  —  the  insufficient  formalization  of  the  proof  process 
and  the  very  concept  of  the  proof  of  the  truth  of  particular  formulas. 

These  deficiencies  are  eliminated  in  the  completely  formal  ap¬ 
proach  to  the  construction  of  propositional  calculus,  which  formalizes 
not  only  the  method  of  writing  the  formulas  (the  method  already  de¬ 
scribed  is  adequate  for  this),  but  also  the  concept  of  the  identically 
true  formulas  and  the  process  of  the  derivation  of  the  logical  corol¬ 
laries  from  particular  propositions. 

The  formal  aspect  of  propositional  calculus  is  characterized  by 
the  fact  that  in  this  case  we  completely  avoid  the  contensive  meaning 
of  the  formulas,  regarding  them  simply  as  finite  sequences  of  individ¬ 
ually  distinguishable  symbols. 

For  the  characterization  of  the  set  of  all  identically  true  for¬ 
mulas  we  construct  the  axiom  system  of  the  calculus  in  question.  Such 
systems  can  be  chosen  in  various  ways.  We  shall  consider  one  of  the 
most  widely  used  axion  systems  of  predicate  calculus  (see  Kleene  [42]). 
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This  system  includes  the  following  axioms : 

1.  A  D( BDA ). 

2.  (A  DB)  D((A  D(B  DC))  D(A  DC)). 

3.  A  D(B  DA  A  B). 

A.  A  A  B  DA. 

5.  A  A  B  DB. 

6.  A  DA  V  B. 

7.  B  DA  V  B. 

8.  M  DC)  D((flD  C)D(A\J  B  DC)). 

9.  (>4  Dfl)  D((4D  “I  B)D  ~A). 

10.  ~1  “I  A  DA.  1 1 .  . 

The  first  ten  axioms  are  simply  ten  formulas  of  propositional 


calculus  which  are  identically  true  by  definition.  The  identical  truth 


of  the  axioms  presents  the  possibility  of  the  substitution  in  place 
of  the  propositional  letters  A,  B,  C  appearing  in  them  of  any  forulas 
of  propositional  calculus  (not  necessarily  true).  Such  a  substitution, 
by  definition,  will  not  destroy  the  identical  truth  of  the  formula 
(axiom)  subjected  to  this  substitution. 

The  eleventh  axiom  has  its  own  specific  nature.  This  is  the  so- 
called  rule  of  derivation  which  makes  it  possible,  by  definition,  to 

consider  the  truth  of  formula  B  proved  if  the  truth  of  formulas  A  and 

•  • 

A  3  B  has  already  been  proved  previously.  If  the  formulas  A  and  A  3  B 
•  •  •  •  • 

are  in  this  case  identically  true  then  formula  B  will  also  be  iden- 
tically  true.  It  is  presumed  by  definition  that  all  identically  true 
formulas  (and  only  such  formulas)  of  propositional  calculus  can  be 
obtained  from  the  axioms  as  the  result  of  the  described  substitutions 
and  applications  (multiple,  generally  speaking)  of  the  derivation 
rule  11. 


It  in  no  wise  follows  a  priority  that  the  set  formally  charac¬ 
terized  in  this  fashion  of  all  identically  true  formulas  of  proposi¬ 
tional  calculus  will  coincide  with  the  set  of  all  identically  true 
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formulas  defined  contensively  above.  Since  the  formal  identical 
truth  of  the  formulas  is  established  by  the  procedure  of  the  deriva¬ 
tion  or  proof,  they  are  also  termed  (formally)  provable  formulas  or 
(formal)  theorems. 

The  formulas  which  are  identically  true  in  the  contensive  sense 
we  shall  for  brevity  term  simply  contensively  true,  contrasting  them 
with  the  formally  true  (i.e.,  formally  provable)  formulas. 

The  concept  of  formal  derivation  (proof)  can  be  extended  to  the 
case  when,  in  addition  to  the  axioms,  there  is  given  also  some  quan¬ 
tity  of  formulas  A^,  Ag  ...,  ^  of  the  propositional  calculus  as  con¬ 
ditionally  true  formulas.  These  formulas  are  not  derivable  from  the 
axioms  (not  formally  provable)  and  therefore  are  not  formally  true 
formulas.  The  presumption  on  their  truth  is  of  a  conditional  nature 
and  is  retained  only  in  the  course  of  the  derivation  in  question.  In 
contrast  with  the  axioms  of  propositional  calculus,  in  these  formulas 
we  cannot,  generally  speaking,  replace  the  propositional  letters  ap¬ 
pearing  in  them  by  arbitrary  formulas.  In  other  words,  conditional 
truth,  in  contrast  with  formal  truth,  does  not  have  an  identical  na¬ 
ture. 

However,  the  rules  of  the  derivation  themselves  are  in  essence 
retained  as  before.  The  primary  role,  as  before,  is  played  by  the  con¬ 
cept  of  the  direct  corollary  (axiom  11) :  formula  B  is  termed  the  di- 

rect  corollary  of  the  formulas  A  and  A  3  B.  We  say  that  the  formula 

•  •  • 

C  of  propositional  calculus  is  derived  from  formulas  An ,  A0,  ...,  A  , 
if  it  can  be  obtained  from  these  formulas  and  axioms  1-10  of  proposi¬ 
tional  calculus  as  the  result  of  the  application  (a  finite  number  of 
times)  of  the  rule  of  the  direct  corollary.  More  precisely.  In  the 
case  being  considered  those  formulas  and  only  those  formulas  will  be 
derivable  which  are  obtained  as  the  result  of  the  sequential  applica- 


tion  of  three  rules.  1.  Any  of  the  formulas  A,,  k0,  . ..,  A  Is  deriv- 
able.  2.  Any  of  the' axioms  1-10  (with  account  for  the  possibility  of 
the  substitution  of  any  formulas  in  place  of  the  letters  appearing  in 

them)  is  derivable.  3*  If  "the  formulas  A  and  A  B  are  derivable, 

•  •  • 

then  formula  B  will  also  be  derivable.  The  chain  of  formulas  obtained 
as  the  result  of  the  sequential  application  of  these  three  rules, 
which  terminates  with  some  formula  C,  is  termed  the  formal  derivation 

•  '  11  ■  -  ■  -  -  -T 

of  this  formula. 

For  the  designation  of  the  derviability  we  make  use  of  the  special 
symbol  |-  (read  as  "gives"),  to  the  left  of  which  there  are  written 
the  conditionally  true  formulas  and  to  the  right  are  written  their 
corollaries:  A,,  A0,  ...,  A  |-  C.  The  axioms  of  propositional  ealeu- 
lus  are  not  written  out  explicitly  here  (the  possibility  of  their  use 
in  the  derivation  is  really  included  in  the  symbol  )  so  that  for 
any  formally  true  formula  B  we  can  write  |-  B.  In  other  words,  the 
formally  true  formulas  are  considered  derivable  from  the  empty  set 
of  (conditionally  true)  fromulas.  Therefore  axioms  1-10  can  also  be 
considered  as  sort  of  rules  of  derivation  which  derive  the  formulas 
representing  them  from  the  empty  set  of  formulas. 

We  shall  present  very  simple  examples  of  the  formal  derivation, 
numbering  the  sequential  steps. 

1.  A  D  (A  3  A)  (axiom  1,  in  which  the  letter  B  is  replaced  by 
the  letter  A). 

2.  (AD  (A  D  A))  D  ((A  D  ((A  D  A)  D  A))  D  (A  D  A))  (axiom  2,  which  the  letter 
B  is  replaced  by  the  formula  A  D  A,  and  the  letter  C  is  replaced  by 
the  letter  A). 

3.  (A  3  ((A  DA)  DA))D  (A  I)A)  (application  of  the  derivation 
rule  11  to  the  formulas  obtained  in  steps  1  and  2). 

4.  A  D  ((A  D  A)  Da  (axiom  1,  in  which  the  letter  B  is  replaced 
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by  the  formula  (A  3  A)). 

5.  A  3  A  (application  of  the  derivation  rule  11  to  the  formulas 
obtained  in  the  preceding  two  steps). 

This  chain  of  formulas  is,  on  the  basis  of  the  definition,  the 
formal  proof  of  the  formula  A  3  A,  i.e.,  its  derivation  from  the  empty 
set  of  (conditionally  true)  formulas.  Thus,  the  formula  A  D)  A  belongs 
to  the  number  of  the  formally  true  formulas  ans  it  can  be  written  as 
b  A  3  A. 

Another  example  is  the  derivation  of  the  corollaries  from  the 
three  conditionally  true  formulas  A,  B,  A  3  (B  DC).  The  formula  C 
can  be  derived  from  these  formulas  after  5  steps. 

1.  A  (first  given  (conditionally  true)  formula). 

2.  B  (second  given  formula). 

3.  A  D  (B  D  C)  (third  given  formula). 

4.  B  D  C  (direct  corollary  (from  derivation  rule  11)  from  formu¬ 
las  1  and  2). 

5.  C  (direct  corollary  of  formulas  2  and  4).  Thus,  formula  C  is 
derivable  from  formulas  A,  B,  A  I)C),  and  we  can  write  A,  B,  A  2) 

D  (b  Dc)  fc. 

Although  the  conditionally  true  formulas  do  not  possess  identi¬ 
cal  truth,  still,  as  it  is  easy  to  see,  in  the  final  writing  of  the 
(conditional)  derivability  with  the  use  of  the  symbol  \-  any  letter 
can  be  replaced  by  an  arbitrary  formula  of  propositional  calculus,  if 
such  a  replacement  is  performed  simultaneously  both  to  the  left  and 
to  the  right  of  the  derivability  symbol.  Replacement  in  only  one  side 
will  lead,  generally  speaking,  to  error. 

In  similar  fashion  we  can  prove  the  relations 
chain  conclusion  A  D  C\-A  DC; 
permutation  of  premises  A  DC)  \-B  D(A  DC); 
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(35) 

(36) 


importation  a  d(B  DC)i-  A  f\  B  DC;  (37) 

exportation  A  f\  BDC\-AH(B^C)\  (38) 

contraposition  A'DB\—~B'D~\A\  (39) 

A  -insertion  A,B\-Af\B\  (40) 

weak  “|  -removal  A~ \A\-B.  (41) 


If  we  designate  by  r  an  arbitrary  finite  ensemble  of  formulas  of 
propositional  calculus,  then,  using  somewhat  more  complex  method  of 
proof  (induction  during  the  derivation)  we  can  obtain  the  following 
result  (the  so-called  deduction  theorm). 

Theorem  1.  If  in  the  propositional  calculus  formula  B  is  derivable 
from  the  combination  of  formulas  r  and  A  then  the  formula  A  3  B  is 
derivable  from  T. 

Two  universal  proof  schemes  are  also  of  Importance  in  the  theory 
of  proofs. 

1.  Proof  by  means  of  analysis  of  cases:  if  r,  A  )-C  and  r,  B  |—  C, 

then 

r  ,A\/B\~c.  (42) 

2.  Reduction  and  absurdum:  if  T,  A  |-  B  and  r,  A  \-  ~ |  B,  then 

(^3) 

It  is  easy  to  verify  that  all  the  axioms  1-10  of  the  propositional 
calculus  are  contensively  true  formulas.  In  other  words,  the  truth 
functions  corresponding  to  them  take  the  value  "true"  for  all  values 
of  the  variables.  This  property  is  obviously  retained  with  substitu¬ 
tions  of  any  formulas  of  the  propositional  calculus  In  place  of  the 
letters  appearing  in  the  axioms.  Prom  the  truth  table  for  implication 

it  follows  directly  that  from  the  contensive  truth  of  the  formulas  A 

• 

and  A  7)  B  there  follows  the  contensive  truth  of  the  formula  B.  But 

•  •  • 

then,  obviously,  all  the  provable  (formally  true)  formulas  will  inevit¬ 
ably  be  contensively  true.  The  reverse  is  also  true  (although  much 
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more  complex  to  prove),  so  that  the  following  important  result  can  be 
formulated. 

Theorem  2.  In  the  formal  construction  of  propositional  calculus 
using  the  system  of  axioms  1-11,  all  those  and  only  those  formulas 
of  this  calculus  will  be  provable  (formally  true)  which  are  identi¬ 
cally  true  in  the  contensivt  sense. 

Theorem  2  contains,  actually,  two  results  relative  to  the  se¬ 
lected  system  S  of  axioms  of  the  propositional  calculus.  The  first 
result  is  that  the  system  S  is  contensively  consistent  or,  other 
words,  with  the  aid  of  the  system  S  we  cannot  prove  a  single  formula 
which  is  not  a  contensively  true  formula. 

The  second  result  states  the  contenslve  completeness  of  the  sys¬ 
tem  of  axioms  S:  there  is  no  single  contensively  true  formula  of  pro¬ 
positions  which  cannot  be  proved  formally  with  the  aid  of  this  system 
of  axioms. 

The  question  arises  of  whether  it  is  possible  to  determine  the 
properties  of  consistency  and  completeness  purely  formally  without 
resorting  to  the  contenslve  constructions.  It  Is  found  that  it  is 
possible. 

It  is  natural  to  term  the  system  of  axioms  of  propositional  cal¬ 
culus  formally  consistent  if  with  its  aid  we  cannot  derive  any  for¬ 
mula  A  together  with  its  negation  ”|  A,  and  formally  inconsistent  in 
the  opposite  case. 

From  the  property  of  weak  ”|  -removal  It  follows  directly  that  in 
tbe  case  of  the  formal  inconsistency  of  the  system  of  axioms  any  for¬ 
mula  of  propositional  calculus  would  be  formally  provable.  Since,  as 
the  result  of  Theorem  2,  for  the  system  S  the  letter  situation  does 
not  occur,  then  this  system  is  not  only  contensively  consistent  but 
also  is  formally  consistent,  or,  as  is  often  said,  is  simply  a  con- 
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slstent  system  of  azioms. 


The  property  of  formal  completeness  for  the  system  of  axioms  can 
be  defined  as  follows:  a  system  of  axiom  is  termed  formally  complete 
(or  complete  in  the  restricted  sense)  if  the  addition  to  this  system 
as  a  new  axiom  of  any  formula  which  is  not  provable  in  the  system 
leads  to  the  system  of  axioms  thus  expanded  being  formally  inconsis¬ 
tent.  In  this  case  it  is  usually  presumed  that  the  original  axiom 
system  was  formally  consistent. 

It  can  be  shown  that  the  system  of  axioms  1-11  of  propositional 
calculus  which  we  have  introduced  is  not  only  a  contenslvely  but  also 
a  formally  complete  system  of  axioms.  Under  the  condition  of  satisfac¬ 
tion  of  the  property  of  contensive  consistency  and  with  the  use  of 
only  rule  11  as  a  derivation  rule,  from  the  property  of  formal  com¬ 
pleteness,  since  otherwise  any  nonprovable  contenslvely  true  formula 
could  be  used  for  consistent  expansion  of  the  original  axiom  system. 

In  the  axiom  system  we  have  chosen  there  is  not  a  single  redun¬ 
dant  axiom.  More  precisely,  no  one  of  the  formulas  1-10  can  be  formally 
proved  with  the  aid  of  the  ensemble  of  all  the  remaining  axioms.  This 
property  is  termed  the  property  of  Independence  of  the  axioms  of  the 
selected  system.  The  property  of  independence  is  proved  separately  for 
each  axiom  with  the  aid  of  the  construction  of  a  contensive  interpre¬ 
tation  for  which  this  axiom  is  not  utilized  while  all  the  remaining 
axioms  are  utilized. 

We  note,  finally,  that  although  the  joining  of  unprovable  formu¬ 
las  as  new  axioms  to  the  propositional  calculus  axiom  system  S  which 
we  have  chosen,  on  the  strength  of  the  property  of  formal  completeness 
of  this  system,  destroys  the  property  of  its  formal  consistency,  noth¬ 
ing  prevents  us  from  Joining  to  the  system  S  the  unprovable  (3n  S) 
formulas  A^  . . . ,  A^  as  conditionally  true  formulas  rather  than  as 
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identically  true  formulas.  It  can  be  shown  that  inconsistency  (the 
possibility  of  deriving  some  formula  together  with  its  negation)  in 
this  case  arises  when  and  only  when  the  conjunction  AW,  A  •••  AW*  is 
an  identically  false  formula. 

As  the  result  of  this  joining,  there  arises  a  formal  theory  wnx~u 
goes  beyond  the  framework  of  mathematical  logic  proper,  since  the 
joined  formulas  A^,  A . ..,  A^  are  not  true  in  the  strictly  logical 
sense.  If  in  our  constructions  there  is  sane  particular  contensive 
meaning,  then  the  contensive  truth  of  the  formulas  A^,  A g,  ...,  A^ 
must  be  postulated  or  have  some  clearly  extra-logical  basis.  In  that 
case  it  is  natural  to  consider  these  formulas  as  axioms  of  the  formal 
theory  constructed  on  their  basis.  In  order  not  to  confuse  them  with 
the  logic  axioms  1-11  themselves,  the  latter  are  in  this  case  termed 
not  axioms,  but  axiom  schemes,  thereby  emphasizing  that  each  of  the 
axioms  1-11  is  actually  a  whole  set  of  axioms  obtained  from  the  for¬ 
mula  corresponding  to  this  axiom  as  the  result  of  the  replacement  by 
arbitrary  formulas  of  the  propositional  calculus  of  the  letters  ap¬ 
pearing  in  this  formula. 
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Chapter  3 

THEORY  OF  AUTOMATA 

§1.  ABSTRACT  AUTOMATA  AND  AUTOMATON  REPRESENTATIONS 

Let  us  consider  the  alphabetic  transformations  realizable  by- 
discrete  information  processors  which  put  out  some  output  signal 
(letter  of  the  output  alphabet)  in  response  to  each  input  signal 
(letter  of  the  input  alphabet).  Such  processors,  considered  without 
regard  to  their  internal  structure,  are  customarily  termed  abstract 
automata. 

For  the  specification  of  an  abstract  automaton,  three  sets  must 
be  given:  the  input  alphabet  X  ,  the  output  alphabet  Y  and  the  set  of 
Internal  states  of  the  automaton,  which  we  shall  denote  by  the  letter 
A.  The  automaton  operates  in  discrete  time,  whose  sequential  moments 
are  conveniently  identified  with  the  sequential  natural  numbers 
t  =  0,  1,  2,  ...  (which  we  can  always  do  by  suitable  choice  of  the 
time  measurement  unit). 

At  every  given  instant  of  discrete  automaton  time  t  =  0,  1,  ... 
the  automaton  A  is  in  some  definite  state  a  =  a(t)  of  the  set  A  of 
its  internal  states,  which  for  brevity  we  shall  term  the  state  set  of 
the  automaton  A.  The  state  =  a(o)  at  the  initial  instant  of  time 
t  =  0  is  termed  the  initial  state  of  the  automaton  A.  If  the  initial 
state  remains  unchanged  during  any  experiments  with  the  automaton, 
then  this  automaton  is  termed  an  initial  automaton.  Since,  however,  in 
practice  we  do  not  consider  any  automata  other  than  initial,  the  term 
"initial"  is  frequently  dropped. 
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At  every  instant  t  of  automaton  time,  beginning  with  t  =  1,  to 
the  input  of  the  automaton  there  is  applied  as  the  input  signal  one 
of  the  letters  of  the  input  alphabet  X  x  =  x(t).  The  finite  ordered 
sequences  of  the  input  signals  x(l)x(2)  ...  x(k)  of  the  automaton  are 
termed  the  input  words  of  this  automaton.  Any  input  word  from  some 
a  priori  fixed  set  of  admissible  input  words  can  be  applied  to  the  in¬ 
put  of  the  automaton. 

Any  admissible  word  p  =  x(l)  x  (2)  ...  x(k),  applied  to  the  in¬ 
put  of  a  given  initial  automaton  A  causes  the  appearance  at  the  out¬ 
put  of  the  automaton  of  ohe  output  word  q  =  y(l)y(2)  ...  y(k),  which 
is  some  ordered  finite  sequence  of  the  output  signals  of  the  automaton 
A  (letters  of  its  output  alphabet  Y)  having  the  same  length  as  its 
corresponding  input  word  jd  and  which  is  uniquely  determined  by  the 
input  word  jd.  The  resulting  correspondence  cp  between  the  admissible 
input  words  jd  and  their  corresponding  output  3  is  termed  the  (alpha¬ 
betic)  representation  Induced  by  the  initial  automaton  A  in  question. 

This  representation  9  is  uniquely  determined  by  specifying  the 
two  functions  6  and  X,  termed  respectively  the  switching  function  and 
the  output  function  of  the  automaton  A  in  question. 

The  switching  function  determines  the  state  a(t)  of  the  automaton 
at  any  instant  of  discrete  automaton  time  t  from  the  input  signal  x(t) 
at  that  same  instant  and  from  the  state  a(t  _  l)  at  the  preceding  in¬ 
stant  of  automaton  time 

a(t)  =*  6(a(/—  1),  x(0). 

The  output  function  determines  the  variation  of  the  output  sig¬ 
nal  y(t)  of  the  automaton  with  these  same  variables 

y(0  =  M«(*-l).  *(/)).  (45) 

Specifying  any  input  word  p  =  x(l)  x  (2)  ...  x(k)  and  initial 
state  a(o)  of  the  automaton,  with  the  aid  of  relations  (44)  and  (45) 


we  can  sequentially  determine  all  the  letters  of  the  corresponding 
output  word 

q  =  <p(P)“  y^)y(2)...y(k). 

»  • 

Thus,  the  relations  (44)  and  (45)  actually  define  the  representa¬ 
tion  qp  induced  by  the  automaton. 

The  switching  and  output  functions  are  usually  the  abstract  par¬ 
tial  functions  6(a,  x)  and  X(a,  x)  which  specify  the  single-valued 

representations  of  some  set  of  pairs  (a,  x)  (a  e  A,  x  e  X)  in  the  sets 

•  • 

A  and  Y  respectively.  Admissible  input  words  are  those  and  only  those 

•  • 

input  words  2  on  which  with  the  aid  of  the  function  6  and  X  using  the 
method  described  above  there  are  determined  their  corresponding  output 
words  q>(p). 

The  automaton  is  termed  finite  if  all  three  of  the  sets  A,  X,  Y 
defining  it  are  finite.  Since  we  limit  ourselves  almost  exclusively 
to  the  consideration  of  finite  automata,  the  word  "finite"  is  often 
dropped.  The  automaton  is  called  completely  determinate  if  its  switch¬ 
ing  and  output  functions  are  given  on  all  pairs  (a,  x),  and  partially 
determinate  otherwise. 

The  finite  automata  are  customarily  specified  by  two  tables, 
termed  respectively  the  switching  table  and  the  output  table  of  the 
automaton.  The  rows  of  both  tables  are  designated  by  the  different 
letters  of  the  input  alphabet  X  of  the  automaton,  and  the  columns  by 
the  different  states  of  the  automaton.  At  the  intersection  of  the 
x-th  row  and  the  a-th  column  of  the  switching  table  there  stands  the 
element  6 (a,  x),  i.e.,  some  state  of  the  automaton  from  the  set  of 
its  internal  states,  ar.d  at  the  intersection  of  the  x-th  row  and  the 
a-th  column  of  the  output  table  there  stands  the  element  X(a,  x),  i.e., 
some  letter  of  the  output  alphabet  Y  of  the  automaton.  Thus  the  spe¬ 
cification  of  the  switching  and  output  tables  determines  both  the  sets 
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X,  Y,  A,  and  the  switching  and  output  functions  of  the  automaton.  For 

•  •  • 

fixing  the  initial  state  it  is  usually  customary  to  designate  the  first 
column  on  the  left  of  both  these  tables  with  this  state.  Thus,  the 
use  of  the  two  tables  makes  it  possible  to  specify  any  finite  auto¬ 
mata,  including  the  initial  automata. 

Another  method  of  specifying  the  finite  automata  which  provides 
better  visualization  is  that  of  the  directed  graphs.  The  vertices  of 
the  graph  (shown  as  circles  on  the  figures)  are  identified  with  the 
various  states  of  the  automaton.  The  arrow  connecting  the  vertex  _i 
with  the  vertex  J  signifies  that  there  exists  an  input  signal  x  which 
transfers  the  automaton  from  the  state  _i  into  the  state  J,  i.e.  sat¬ 
isfying  the  relation 

l  =  6(i,x). 

In  order  to  differentiate  precisely  which  input  signals  cause 
the  transfer  of  the  automaton  from  state  1  into  the  state  J,  the  arrow 
connecting  the  graph  vertices  corresponding  to  these  states  are  flagged 
with  the  symblos  of  these  input  signals.  The  output  signal  %  deter¬ 
mined  by  the  pair  (i,  x)  is  usually  placed  on  the  graph  alongside  the 

input  signal  x  and  to  differentiate  it  from  the  input  signals  it  is 
Inclosed  in  parentheses. 

Let  us  consider  an  example  of  the  specification  of  a  finite 
automaton  using  the  switching  and  output  tables  of  the  directed  graph. 
Let  us  choose  for  this  purpose  the  relatively  simple  automaton  with 
three  internal  states  1,  2,  3,  two  input  signals  x,  y  and  two  output 
signals  u,  v.  We  assume  that  this  automaton  is  specified  by  the 
switching  and  output  tables 
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12  3  12  3 

_ _ •  _ _ i 

x  2  3  3  x  utu  v 
y  322  y  v\u  u 

The  directed  graph  shown  in  Pig.  8  corresponds  to  these  tables. 
The  automata  we  have  considered  above  are  customarily  termed 
Mealy  automata  (from  the  name  of  the  scientist  who  first  considered 
several  questions  associated  with  the  functioning  of  such  automata; 
see  [55])*  In  practice  we  frequently  have  to  deal  also  with  somewhat 


differently  defined  automata  which  are  termed  Moore  automata  (see 

[57]). 


The  Moore  automata  differ  from  the  Mealy 
automata  only  in  the  method  of  defining  their 
output  functions.  In  place  of  the  relation 

y(t)=  X(a(/-  1).  *(/)). 

which  defines  the  output  signal  for  the  Mealy 
automata,  in  the  case  of  the  Moore  automata 
q  we  use  a  somewhat  different  relation 

■*(')=  n  <«<<»•  (46) 

Which  the  aid  of  relation  (46)  and  the  previously  written  rela¬ 
tion  (44),  just  as  in  the  preceding  case,  there  is  determined  the  re¬ 
presentation  induced  by  any  given  Moore  automaton. 

For  reasons  which  will  be  considered  later,  we  call  the  function 
y  =  |i(a)  the  shifted  output  function  of  the  Moore  automaton.  The  val¬ 
ue  of  this  function  for  any  state  a  is  customarily  termed  the  label  of 
this  state.  The  finite  Moore  automata  are  conveniently  specified  with 
the  use  of  the  so-called  labelled  switching  tables.  The  labelled 
switching  table  is  nothing  other  than  the  conventional  switching  table 
of  an  automaton  in  which  above  the  symbols  of  the  states  designating 
the  various  columns  of  the  table  there  are  placed  the  labels  of  these 
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states.  For  example,  the  labelled  switching  table 


U  U  V 

1  2  3 

X 

2  3  3 

y 

3  2  2 

specifies  the  Moore  automaton  having  the  same  switching  table  as  tiu 
Mealy  automaton  in  which  the  output  signal  u  corresponds  to  states 
1  and  2,  and  the  output  signal  v  corresponds  to  the  state  3»  In  the 
representation  of  the  Moore  automata  with  the  use  of  graphs,  the 
symbols  of  the  output  signals  label  the  corresponding  vertices  of  the 
graph,  and  not  the  lines  as  in  the  case  of  the  Mealy  automata. 

We  agree  to  consider  that  the  delivery  of  the  output  signals  in 
the  Moore  automaton  begins  at  the  instant  of  time  t  =  1  (at  not  at 
the  instant  of  time  t  =  0).  With  this  condition,  for  any  Moore  autom¬ 
aton  it  is  not  difficult  to  construct  that  Mealy  automaton  Ag  hav¬ 
ing  the  same  switching  table  and  inducing  the  same  representation  as 
the  automaton  A1« 

Actually,  if  6(a,  x)  is  the  switching  function  and  p-(a)  is  the 
shifted  output  function  of  the  Moore  automaton  A^,  then  we  can  define 
the  Mealy  automaton  A2  by  specifying  its  switching  function  6 (a.,  x) 
and  output  function  X(a,  x)  =  n(6  (a,  x)).  Then 

y(t)  =  \(a(t-l),  *(/))  =  ji(6(a(f  —  1),  *(*)))  -  }t(a(0), 
which  proves  that  the  automata  A^  and  react  completely  identically 
to  any  sequence  of  input  signals.  The  construction  described  is  termed 
the  interpretation  of  the  given  Moore  automaton  as  a  Mealy  automaton. 
The  physical  meaning  of  such  an  interpretation  (in  real  automata)  con¬ 
sists  in  the  shift  of  the  automaton  time  by  one  elementary  interval 
of  time,  on  the  strength  of  which  in  the  constructed  Mealy  automaton 
kr  the  output  signals  lead  by  one  unit  of  automaton  time  their  cor¬ 
responding  output  signals  in  the  Moore  automaton  A^.  It  is  precisely 
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for  this  reason  that  the  output  functions  of  the  Moore  automaton  are 
termed  the  shifted  output  functions. 

The  described  time  shift  maked  it  possible  to  consider  the  Moore 
automata  as  a  particular  case  of  the  Mealy  automata  every  time  that 
we  are  interested  not  in  the  real  time  of  the  appearance  of  a  partic¬ 
ular  output  signal,  but  only  in  the  sequence  of  succession  of  the  out¬ 
put  signals  in  time.  It  is  exactly  this  situation  which  we  encounter 
in  the  abstract  theory  of  automata,  being  interested  only  in  the  re¬ 
presentations  Induced  by  the  automata  and  the  switchings  in  their 
memory,  and  not  in  the  method  of  composition  of  a  given  automaton  from 
the  elementary  automata  available  to  us. 

In  the  resolution  of  the  latter  question,  constituting  the  sub¬ 
ject  of  the  so-called  structural  theory  of  automata,  the  Mealy  autom¬ 
ata  are  to  be  considered  as  a  separate  class  of  automata  which  is  not 
an  intrinsic  subclass  of  the  class  of  all  Mealy  automata.  The  differ¬ 
ence  between  these  two  classes  of  automata  in  structural  theory  is 
due  to  the  fact  that  In  the  Mealy  automata  the  output  signal  arises 
simultaneously  with  the  Input  signal  which  Induces  it,  while  in  the 
Moore  automata  there  is  a  delay  of  one  unit  of  automaton  time. 

The  possibility  of  the  interpretation  of  every  Moore  automaton 
as  a  Mealy  automaton  in  the  abstract  theory  of  automata  does  not  in¬ 
dicate,  or  course,  the  existence  of  the  reverse  possibility.  Never¬ 
theless,  for  any  Mealy  automaton  A  we  can  construct  a  Moore  automaton 
B  which  will  induce  the  same  representation  as  the  automaton  A.  Here, 
in  contrast  with  the  preceding  case,  the  set  of  states  of  automaton 
B  will  not,  generally  speaking,  coincide  with  the  set  of  states  of 
automaton  A,  althought  it  will  be  finite  whenever  the  latter  set  is 
finite. 

Actually,  let  us  assume  that  there  is  given  the  arbitrary  Mealy 
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automaton  A  with  the  set  of  states  A,  the  input  alphabet  X,  the  out- 

•  • 

put  alphabet  Y,  the  switching  function  6(a,  x),  the  output  function 
X(a,  x)  and  the  initial  state  aQ.  We  agree  for  simplicity  of  notation 
that  here  and  hereafter  we  shall  use  in  place  of  the  switching  func¬ 
tion  the  multiplication  symbol,  designating  the  value  of  the  function 
6  (a,  x)  by  the  product  ax. 

Let  us  construct  the  Moore  automaton  B,  selecting  as  the  set  B, 

• 

of  its  states  the  set  consisting  of  the  initial  state  aQ  and  the  set 

of  all  possible  pairs  (a,  x)  where  a  e  A  x  e  X.  The  input  and  output 

•  • 

alphabets  of  the  automaton  B  coincide  respectively  with  the  input  and 
output  alphabets  of  the  automaton  A.  We  determine  the  switching  func¬ 
tion  of  the  automaton  B,  setting 

aox  -  (a0. x)  and  (a,  xt)xt  -=  ( axr  x,). 

We  determine  the  shifted  output  function  p.(b)  of  the  automaton 
B  on  each  state  b  =  (a,  x)  which  differs  from  the  initial  state  aQ 
with  the  aid  of  the  relation  p.(b)  =  X(a,  x).  In  the  initial  state  the 
value  of  the  function  p.  can  be  selected  arbitrarily.  As  a  result  there 
is  constructed  some  Moore  automaton  B. 

It  is  not  difficult  to  see  that  theautomaton  B  induces  the  same 
representation  as  the  automaton  A.  Actually,  let  us  designate  by  the 
letter  <p  the  representation  induced  by  the  automaton  A  and  by  the 
letter  ^  the  representation  induced  by  the  automaton  B.  Assume  that 
for  any  input  word  p  =  p^x^  of  length  n  >  1  it  has  already  been  proved 
that  <p(p)  =  ^(p)  =  q  (for  the  input  word  of  length  1,  i.e.,  for  any 
single-letter  word  x,  obviously,  cp(x)  =  ^(x)  =  X(aQ,  x)). 

Let  us  consider  the  reaction  of  both  automata  to  the  arbitrary 
word  pXj  or  length  n  +  1.  Let  us  agree  here  and  hereafter  to  designate 
with  the  word  a J  the  state  into  which  there  transfers  an  automaton 
which  was  initially  in  the  state  a  if  to  its  input  there  is  applied 
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sequentially,  letter  after  letter,  the  arbitrary  word  l.  As  a  result 
of  the  definition  of  the  switching  function  in  automaton  B,  a  p  = 

=  (a^p^,  x^).  After  application  to  the  automata  A  and  B  of  the  Input 
signal  Xj  the  automaton  A  delivers  the  output  signal  y  =  > (a  p,  x.). 
Automaton  B  will  obviously  transfer  into  the  state 

b  =  (a0Pi*<,  */)  =  (a0p.  Jt/) 

and  will  deliver  the  output  signal  11(b),  equal,  as  a  result  of  the 
definition  of  the  function  \±,  to  the  signal  X(aQp,  x  .). 

Thereby  it  is  shown  that  the  automata  A  and  B  react  identically 
to  any  input  word  to  length  n  +  1.  Performing  an  induction  with  re¬ 
spect  to  n,  we  come  to  the  conclusion  that  the  representations  induced 
by  the  automata  A  and  B  are  identical.  This  conclusion  is  valid  not 
only  for  the  conventional  (completely  determinate)  automata,  but  also 
for  the  partial  automata. 

Let  us  characterize  in  more  detail  the  representations  induced 
by  the  automata.  We  note  that  the  requirement  for  the  arrival  of  an 
input  signal  and  the  departure  of  an  output  signal  at  every  instant 
of  automaton  time,  which  at  first  glance  is  not  satisfied  in  any  spe¬ 
cific  automata,  in  actuality  is  easily  satisfied  if  we  introduce 
special  letters  for  the  designation  of  empty  input  and  output  signals 
(i.e.,  the  absence  of  any  real  physical  signals)  and  consider  these 
letters  on  a  par  with  the  other  letters  of  the  input  and  output  alpha¬ 
bets. 

It  is  easy  to  see  that  the  representation  cp  induced  by  the  arbi¬ 
trary  Moore  or  Mealy  automaton  satisfies  two  conditions: 

1)  to  any  word  l  in  the  input  alphabet  X  the  representation  cp 
associates  a  word  <p(i)  in  the  output  alphabet  Y  which  has  a  length 
identical  to  that  of  the  word 

2)  if  the  word  coincides  with  the  initial  segment  of  the  word 

l,  then  the  word  (pf^)  is  the  initial  segment  of  the  word  cp(^). 


J 


Let  us  term  the  conditions  Just  formulated  the  automatic it. y  con¬ 
ditions  of  the  representation  ?p  and  every  correspondence  between  the 
words  in  the  alphabets  X  and  Y  which  satisfy  these  conditions  an  autom- 
at on  representation  or  automaton  operator. 

It  is  not  difficult  to  show  that  every  automaton  representation 
can  be  induced  with  the  aid  of  some  abstract  automaton  (not  necessar¬ 
ily  finite). 

Let  the  automaton  correspondence  9  map  the  set  of  words  in  the 
alphabet  X  =  (x^,  xg,  . . . ,  xn)  into  a  set  of  words  in  the  alphabet 
Y  =  (y^,  yg#  . ..,  ym).  Let  us  construct  the  automaton  A  whose  internal 
states  will  be  all  possible  words  in  alphabet  X  and  the  initial  state 
will  be  the  empty  word  e  (word  of  zero  length,  consisting  of  an  empty 
set  of  letters).  The  switching  function  6  is  determined  trivially:  if 
£  is  any  state  of  the  automaton  (word  in  the  alphabet  X),  and  xi  is 
any  input  signal,  then  the  value  of  the  function  x^)  is  assumed 

equal  to  the  word  £x.^.  After  determining  the  output  function  X  by  the 
relation  X(£,  x^  =  yj,  where  y^  is  the  last  letter  of  the  word  cp(^x;L), 
we  obtain  an  automaton  which  realizes  the  original  mapping  cp. 

If  the  mapping  q>  of  the  set  of  words  in  the  alphabet  X  into  the 
set  of  words  in  the  alphabet  Y  Is  given  by  a  partial  automaton,  then 
it  will  be,  of  course,  only  a  partial  mapping,  not  determinate  on  all 
the  words.  However,  as  before,  both  conditions  of  automaticity  will 
be  satisfied  for  this  mapping  under  the  additional  assumption  that 
<p(iO  exists.  In  this  case  the  second  condition  of  automaticity  takes 
a  stronger  form:  if  cp(^)  exists  and  initial  segment  of  the 

word  i,  then  ^(i^)  exists  and  coincides  with  some  initial  segment  of 
the  word  q>(  £) . 

We  shall  term  the  rephrased  conditions  the  automaticity  condi¬ 
tions  of  the  partial  mapping  cp,  and  every  partial  mapping  satisfying 
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these  conditions  will  be  termed  a  partial  automaton  mapping. 

It  is  easy  to  setablish  the  validity  of  the  following  proposi¬ 
tion. 

Theorem  1.  Every  partial  automaton  mapping  can  be  induced  with 
the  aid  of  some  partial  automaton  (not  necessarily  finite). 

This  proposition  is  proved  by  exactly  the  same  method  as  in  the 
case  of  the  complete  mapping.  The  difference  is  that  the  states  of 
the  partial  automaton  are  considered  to  be  not  all  the  words  of  the 
input  alphabet,  but  only  those  on  which  the  mapping  q>  is  determinate. 

At  first  glance  the  automaticity  conditions  severely  narrow  the 
class  of  mappings  which  can  be  specified  with  the  aid  of  the  abstract 
automata.  It  is  well  known,  in  particular,  that  the  requirement  for 
equality  of  the  lengths  of  the  input  and  output  words  is  not  satis¬ 
fied  for  a  large  portion  of  the  algorithms  which  must  be  satisfied 
by  particular  specific  automata.  This  difficulty,  seeming  very  seri¬ 
ous  at  first  glance,  in  actuality  is  easily  removed  with  the  aid  of 
recoding  of  the  input  and  output  information  on  the  basis  of  a  very 
simple  technique. 

The  standard  technque  for  the  conversion  of  any  partial  corre¬ 
spondence  q>  between  words  In  the  alphabets  X  and  Y  Into  a  partial 

•  • 

automaton  correspondence  is  based  on  the  introduction  into  the  alpha¬ 
bets  X  and  Y  of  the  letter  a  which  was  not  contained  in  them  previ- 

•  • 

iously.  The  letter  a  is  termed  an  empty  word.  The  appearance  of  the 
empty  word  at  the  automaton  input  corresponds  to  the  case  when  in 
actuality  nothing  is  applied  to  the  automaton  input.  Similarly  the 
appearance  of  the  empty  word  as  an  output  signal  signifies  the  ab¬ 
sence  of  any  signal  at  the  automaton  output. 

Let  us  consider  the  arbitrary  word  i  of  length  n  in  the  alphabet 
X,  to  which  the  initially  specified  partial  mapping  cp  associated  the 
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word  q  =  <p(i)  of  length  m  in  the  alphabet  Y.  Let  us  designate  by  the 
letter  £,  the  word  in  the  alphabet  X.,  XU  (a) ,  which  is  obtained  as 
a  result  of  the  suffixing  to  the  word  £  on  the  right  m  exemplars  of 
the  letter  a.  Similarly,  we  use  the  word  q-^  to  designate  the  word  in 
the  alphabet  Y,  Y  U(a),  obtained  as  a  result  of  the  prefixing  to  trie- 
word  c[  on  the  left  n  exemplars  of  the  latter  a.  We  term  this  technique 
the  standard  technique  for  equalizing  word  lengths. 

Let  us  determine  a  new  partial  mapping  cp^  between  words  in  the 
alphabets  X.^  and  Y1,  setting  q1=  ^(i^)  and  repeating  this  technique 
for  any  word  £  in  the  alphabet  X  on  which  the  mapping  cp  is  determinate. 
We  further  define  this  correspondence  on  all  the  initial  segments 

of  the  word  £^s  assuming  that  coincides  with  the  initial 

segment  of  the  word  having  a  length  equal  to 

With  this  redefinition  there  arises  the  danger  of  loss  of  unique¬ 
ness  of  the  mapping  <p,  since  the  word  £^  can  occur  not  only  as  the 
initial  segment  in  the  original  word  £y  but  also  as  the  initial  seg¬ 
ment  in  another  word,  for  example  in  the  word  s^  obtained  as  the  re¬ 
sult  of  the  application  of  the  standard  technique  of  equalizing  word 
lengths  from  some  word  s  in  the  alphabet  X. 

Since  the  word  s-^  has  the  form  s1  =  saa  ...a,  and  the  word  £^ 
has  the  form  £^  =  £aa  ...  a,  where  the  words  _s  and  £  do  not  contain 
the  letter  a,  then  p  =  s  =  £  if  the  word  has  on  the  right  at 

least  one  letter  a  :  =  pa...  .  In  this  case,  consequently,  the 

words  s-^  and  £^  must  coincide  with  one  another  and  there  is  no  danger 
of  ambiguity  arising. 

It  remains,  thus,  to  consider  the  case  when  the  word  =  p 

consists  exclusively  of  letters  of  the  alphabet  X.  In  this  case  the 
length  of  the  word  £  will  clearly  not  exceed  the  lengths  of  the  words 
£  and  _s.  But  then,  as  a  result  of  the  standard  technique  for  the  equal- 
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izing  of  word  lengths,  the  initial  segments  of  the  words  cp(^)  and 
cp^S^),  having  a  length  equal  to  that  of  the  word  =  p,  consist 

entirely  of  the  letters  a  and,  consequently,  coincide  with  one  another. 
Thus,  the  occurrence  of  ambiguity  is  excluded  again  in  this  case. 

The  partial  mapping  q>^  between  words  in  the  alphabets  and 
which  we  have  constructed  satisfies  both  conditions  of  automaticity 
for  partial  mappings  on  the  basis  of  the  method  of  construction  itself 
and  is,  consequently,  the  sought  partial  automaton  mapping. 

The  described  technique  for  the  transformation  of  any  partial 
mapping  into  an  automaton  mapping  is  universal,  however,  precisely  be¬ 
cause  of  its  universality  it  does  not  always  lead  to  the  most  econom¬ 
ical  (from  the  point  of  view  of  the  use  of  additional  letters)  solu¬ 
tion.  This  circumstance  is  particularly  easily  clarified  for  the  case 
when  the  original  partial  mapping  cp  itself  satisfied  both  conditions 
of  automaticity.  It  is  clear  that  the  most  economical  solution  in  this 
case  will  be  cp-^  =  cp.  However,  the  c  'scribed  standard  method  (which  we 
use,  of  course,  in  this  case  as  well)  leads  to  an  unnecessary  increase 
of  the  lengths  of  the  original  words  which  participate  in  the  corre¬ 
spondence. 

Thus,  the  universal  technique  found  does  not  avoid  the  necessity 
for  looking  for  more  economical  solutions.  Such  economic  solutions  are 
usually  found  by  adding  empty  letters  to  the  words  gradually,  step  by 
step,  rather  than  all  at  once  In  the  quantity  provided  for  by  the 
standard  technique  for  equalizing  word  lengths,  checking  at  each  step 
for  satisfaction  of  the  automaticity  conditions  and  stopping  as  soon 
as  they  are  satisfied  for  the  first  time.  Such  an  Improved  technique 
for  equalizing  word  lengths  will  lead  sooner  or  later  to  the  appearance 
of  the  automaton  mapping. 

Of  considerable  interest  is  the  problem  of  finding  the  economical 


-  157  - 


recoding  of  the  mapping,  given  on  a  particularalgorithmic  language 
(for  example,  on  the  language  of  the  normal  algorithms)  for  the  pur¬ 
pose  of  converting  it  into  sn  automaton  correspondence,  and  also  the 
problem  of  the  construction  of  the  theory  of  algorithms  which  satisfy 
the  conditions  of  automat icity  and  therefore  are  termed  automaton  al¬ 
gorithms  for  short.  One  of  the  possible  approaches  to  the  theroy  of 
automaton  algorithms  id  developed  in  the  following  section. 

§2.  EVENTS  AND  REPRESENTATION  OP  EVENTS  IN  AUTOMATA 

Let  A  be  an  arbitrary  (partial,  generally  speaking)  initial  autom¬ 
aton,  q>  the  mapping  induced  by  it.  For  each  letter  y^  of  the  output 
alphabet  Y  =  (y. ,  yoi  ...,  y  )  of  automaton  A  let  us  consider  the  set 
R.^  of  all  words  £  in  the  input  alphabet  X  =  (x^  x2,  ...,  xn)  of  this 
automaton  for  which  the  word  q>(i)  is  defined  and  ends  with  the  letter 

*r 

Let  us  term  the  set  R^  thus  defined  an  event,  represented  in  the 
(partial)  automaton  A  by  the  output  signal  y1  (i  =  1,  2,  ...,  m).  If 
M  Is  any  set  of  output  signals,  then  we  shall  term  the  union  of  events 
represented  by  all  elements  of  this  set  an  event,  represented  in  the 
partial  automaton  A  by  the  set  M. 

It  is  easy  to  see  that  the  sets  are  disjoint  and  that  the  set 
S  of  all  words  in  the  alphabet  X  which  do  not  occur  in  even  one  of  the 
sets  R^  (i  =  1,  2,  . . . ,  m)  consists  of  all  words  forbidden  for  the 
given  partial  automaton.  Here  and  herafter  we  use  the  term  forbidden 
for  all  words  in  the  input  alphabet  which  when  applied  to  the  input  of 
the  given  partial  automaton  lead  for  at  least  one  component  of  their 
input  signal  to  an  output  signal  which  is  not  defined  in  the  automaton. 
We  agree  to  call  the  ensemble  of  all  forbidden  words  S  the  forbidden 
domain  of  the  given  partial  automaton  A.  We  agree  also  to  term  any 
set  of  words  in  the  alphabet  X  an  event  in  this  alphabet. 
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From  the  definitions  introduced,  we  can  formulate  the  result  ob¬ 
tained  above  in  the  form  of  the  following  proposition. 

Theorem  1.  Specification  of  the  partial  automaton  mapping  <p,  re¬ 
alizable  by  the  partial  automaton  A  with  the  input  alphabet  X  = 

=  (xp  Xg,  xn)  and  with  the  output  alphabet  Y  =  (y^,  y0,  y  ) 

uniquely  determines  the  partition  of  the  set  F  of  all  words  in  the 

alphabet  X  into  m  +  1  disjoint  events  in  the  alphabet  X,  and  namely  in- 

•  • 

to  the  events  R^,  Rg,  R^,  represented  in  the  automaton  A  by  the 

output  signals  y^,  y2,  . ..,  y  ,  and  determines  the  forbidden  domain  S 
of  the  given  (partial)  automaton  A. 

And  conversely:  knowing  the  events  R1,  Rg,  R^  represented 

in  some  partial  A  by  the  output  signals  y^,  yg,  y^  we  can  uniquely 

recover  the  partial  mapping  cp  between  the  words  of  the  input  alphabet 

X  and  the  output  alphabet  Y  realized  by  this  automaton,  without  using 

•  • 

the  switching  and  output  functions  of  the  automaton. 

Let  there  be  given  the  arbitrary  word  £  =  x,  x,  . . .  x,  in  the 

X1  x2  1n 

alphabet  X.  For  each  k(l  <  k  <  n)  we  find  the  output  signal  y.  using 

.  ,ik 

the  rule:  y.  is  the  output  signal  representing  in  the  automaon  A  the 
,fl 

event  R.  which  contains  the  initial  segment  x.  x.  . . .  x.  of  length 
llk  1  2  k 

k  of  the  word  J.  If  for  all  k  =  1,  2,  ...,  n  there  exist  the  corre¬ 
sponding  y.  ,  then  we  set  cp(i)  =  cp(x,  x,  ...  x.  )  =  y.  y.  ...  y.  . 

Jk  11  12  xn  J1  J2  Jn 

In  the  case  where  an  output  signal  k  =  1,  2,  ...,  n  with  the  required 


properties  does  not  exist  for  even  one  y 


mapping  cp  is  not  determinate  on  the  word  £. 


we  assume  that  the  partial 


It  is  not  difficult  to  see  that  as  a  result  of  the  definition  of 


events  represented  in  an  automaton,  the  partial  mapping  cp  Introduced 
in  this  fashion  will  then  be  precisely  that  partial  mapping  which  is 


induced  by  the  given  partial  automaton  A. 


On  the  basis  of  this  discussion  we  can  formulate  the  following 


proposition. 


Theorem  2.  The  specification  of  the  partial  automaton  mapping  9 
between  words  in  the  alphabets  X  and  Y  =  (y, ,  yP>  ..•>  ym)  is  equiva- 
lent  to  the  specification  of  the  events  R-^,  Rg>  •••>  Rm  represented  by 
j  the  output  signals  y.^  y 2,  ym  in  the  partial  automaton  A  which  in¬ 

duces  the  mapping  <p. 

Theorem  2  lays  the  foundation  for  the  study  of  the  automaton  map¬ 
pings  (in  particular  the  automaton  algorithms).  For  the  description 
of  such  mappings  it  is  sufficient  to  specify  the  partition  of  the  set 
of  all  words  of  the  input  alphabet  into  a  finite  number  of  disjoint 
events.  In  order  that  the  corresponding  descriptions  be  of  a  construc¬ 
tive  nature,  it  is  necessary  to  limit  ourselves  to  the  consideration 
of  only  those  events  which  admit  effective  description. 

It  is  natural  that  first  of  all  the  finite  events,  i.e.,  events 
consisting  of  a  finite  number  of  words,  admit  simple  constructive  de¬ 
scription.  They  can  be  described  with  the  listing  of  the  elements  ap¬ 
pearing  in  them.  For  the  characterization  of  some  important  classes 
of  infinite  events,  it  is  advisable  to  introduce  several  operations 
on  the  set  of  events,  thus  transforming  this  set  into  an  algebra  -  the 
algebra  of  events. 

For  our  purposes  the  most  convenient  is  the  system  of  three  opera¬ 
tions  which  is  a  modification  of  the  operations  first  introduced  by 
Kleene  [40]  (see  also  Copi,  Elgot,  Wright  [45]  and  Glushkov  [21]). 

The  first  operation  is  that  of  the  set-theoretic  union  of  events. 
We  shall  designate  this  operation  by  the  symbol  V  and  term  it  event 
disjunction. 

The  second  operation  is  that  of  event  multiplication,  which  is 
not  to  be  confused  with  the  operation  of  set-theoretic  intersection. 

If  the  event  S  consists  of  the  words  i  (cM),  and  the  event  R  con- 
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sists  of  the  words  qp(p  e  N),  then  product  of  the  events  S  and  R  is 
the  name  given  to  the  event  consisting  of  all  possible  words  of  the 
form  iJctqP  (a  e  M,  £e  N).  The  operation  of  event  multiplication  is  non- 
commutative:  generally  speaking  the  events  SR  and  RS  are  different. 

The  third  operation  is  that  of  the  so-called  event  Iteration,  for 

which  we  shall  use  the  braces  as  the  designation,  so  that  {S}  denotes 

the  iteration  of  the  event  S.  The  iteration  of  any  event  S  is  defined 

1  2 

as  the  union  of  an  empty  word,  the  event  S  =  S  the  event  S»S  =  S  the 

-a 

event  S*SS  =  S“*  and  so  on  to  infinity.  In  other  words,  if  the  event  S 
consists  of  the  words  j>a(ae  M),  then  its  iteration  {S}  consists  of  all 
possible  words  having  the  form  •••  &an  where  a^,  ou, 

...,  an  e  M,  and  n  =  0,  1,  2,  3,  ...  . 

We  shall  term  the  braces  used  for  the  designation  of  iteration 
iteration  brackets.  For  the  designation  of  the  order  of  operations  we 
shall  make  use  of  round  brackets,  which  we  term  conventional  brackets. 
In  the  absence  of  brackets,  used  to  alter  the  usual  order  of  opera¬ 
tions,  Iteration  is  to  be  performed  first,  then  multiplication,  and 
finally  disjunction. 

We  agree  to  designate  the  single-element  events,  I.e.,  events  con¬ 
sisting  of  a  single  word,  by  the  symbol  of  this  word.  If  X  =  (x-^, 

Xgj  ...,  xn),  then  the  m  +  1  single-element  events  x^,,  x2,  ...,  xjn,e 
are  termed  the  elementary  events  in  this  alphabet. 

Here  and  in  the  future  we  shall  use  the  letter  e  to  denote  an 
empty  word,  consisting  of  an  empty  set  of  letters  and  consequently 
having  zero  length.  This  word  will  play  only  an  auxiliary,  service 
role.  We  agree,  in  particular,  not  to  consider  evnets  which  differ 
from  one  another  only  by  an  empty  word  as  different.  Thus,  the  empty 
word  can,  as  desired,  either  be  joined  to  or  removed  from  any  event  In 
question.  This  is  associated  with  the  fact  that  as  a  result  of  the  de- 
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flnitions  which  we  have  adopted  the  empty  word  cannot  be  represented 
in  the  automaton. 

We  shall  now  introduce  a  concept  which  is  central  to  all  the 
subsequent  considerations. 

Any  event  which  can  be  obtained  from  the  elementary  events  x^, 
x2,  . ..,  v  e  in  the  finite  alphabet  X  =  (x1,  x2,  . ..,  xn)  with  the 
aid  of  the  application  of  a  finite  number  of  operations  of  disjunction, 
multiplication  and  iteration  is  termed  a  regular  event  in  this  alpha¬ 
bet. 

This  definition  goes  back  to  the  definition  of  the  regular  event 
given  earlier  by  Kleene  [40]  although  it  differs  considerably  in  form 
(see  Glushkov  [21]).  We  note  that  the  same  event  can  be  represented 
differently  in  terms  of  the  elementary  events.  In  the  future  we 
shall  term  each  such  representation  (formula  of  event  algebra)  a  reg¬ 
ular  expression. 

One  of  the  primary  problems  in  event  algebra  is  the  establishment 
of  the  laws  of  the  equivalent  transformations  of  the  regular  expres¬ 
sions,  i.e.,  those  transformations  which  do  not  change  the  events  re¬ 
presented  by  these  expressions  (with  an  accuracy  to  the  empty  letter 
e). 

Among  the  laws  which  are  very  frequently  utilized  in  the  equiva¬ 
lent  transformations  in  event  algebra  are  the  laws  of  associativity 
for  disjunction  and  multiplication,  the  commutativity  law  for  disjunc¬ 
tion,  the  left  and  right  distributive  laws  for  multiplication  with  re¬ 
spect  to  disjunction  (  S(/?VQ)  =  S/?vS<?,  (R\jQ)S  =  (RSvQS)  and  others). 

The  laws  of  dlstributivity  make  possible,  in  particular,  the  re¬ 
moval  of  brackets  and  the  bringing  of  common  factors  outside  of  the 
brackets  (as  in  conventional  algebra).  Here  we  need  only  recall  that 
multiplication  in  event  algebra  is  generally  speaking,  not  commutative. 
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Any  word  can  be  represented  as  the  product  of  elementary  events  - 
the  individual  letters  constituting  this  word.  Any  finite  event  is  re¬ 
presented  in  the  form  of  the  disjunction  of  the  words  composing  It. 
This  implies,  in  particular,  that  all  finite  events  are  regular. 

The  use  of  iteration  leads  to  the  construction  of  infinite  reg¬ 
ular  events.  At  the  same  time  is  is  not  difficult  to  construct  simple 
examples  of  nonregular  infinite  events.  For  this  it  is  sufficient  to 
select  such  an  increasing  sequence  of  whole  numbers  n^,  n^,  . . . , 
n^,  ...,  that  the  differences  n1+1  —  n^Ci  =1,  2,  ...)  are  not  bounded 
in  the  aggregate  (this  condition  is  satisfied,  for  instance,  by  the 
sequence  of  squares  of  the  numbers  of  the  natural  series),  and  in  any 
input  alphabet  X  construct  the  event  S  consisting  of  all  words  in  the 
alphabet  X  having  lengths  equal  to  n^,  n 2  and  so  on. 

The  event  S  constructed  in  this  way  is  of  necessity  nonregular. 
Actually,  assuming  the  opposite,  we  would  be  able  to  find  for  S  some 
regular  expression  R.  Since  the  event  S  is  infinite,  this  expression 
contains  at  least  one  set  of  iteration  brackets  enclosing  an  expres¬ 
sion  differing  from  the  empty  word  _e.  Let  us  replace  all  the  remaining 
iteration  brackets  in  the  expression  R  by  an  empty  word,  and  the  iden¬ 
tified  brackets  by  the  expression  {p}  where  2  is  an  arbitrary  non¬ 
empty  word  from  the  event  enclosed  in  the  identified  brackets.  As  a 
result  we  obtain  the  regular  expression  R^  for  some  event  contained 
in  the  event  S. 

From  the  expression  R^  it  follows  directly  that  in  the  event  S 
there  appear  words  of  the  form  rs,  rps,  rpps,  rppps,  ...,  whose  lengths 
constitute  an  infinite  increasing  arithmetic  progression.  But  this 
contradicts  the  method  of  construction  of  the  event  S.  Consequently, 
the  event  cannot  be  represented  by  any  regular  expression,  i.e.,  it  is 
a  nonregular  event. 


Let  us  define  also  the  concept  of  the  cyclic  depth  of  regular 
expression,  meaning  by  this  the  maximal  number  of  pairs  of  iteration 
brackets  embedded  in  one  another  which  are  contained  in  this  expres¬ 
sion.  For  example,  the  expression  {x(y}  {x}}  has  a  cyclic  depth  of  2, 
while  the  expression  { x\/y)x[y\  has  a  cyclic  depth  of  1.  By  the  cyclic 
depth  of  a  regular  event  we  shall  understand  the  minimal  cyclic  depth 
of  the  regular  expressions  representing  it. 

Regular  events  have  particular  importance  for  the  abstract  theory 
of  automata,  since  the  class  of  regular  events  coincides  with  the 
class  of  events  representable  in  finite  automata.  In  the  following 
sections  we  shall  prove  this  Important  proposition;  here  we  shall  con¬ 
sider  the  question  on  the  relationship  of  the  classes  of  events  re¬ 
presentable  in  the  Mealy  and  Moore  automata. 

The  general  definition  of  the  representation  of  events  in  an 
automaton  given  in  the  beginning  of  the  present  section  related  to  the 
Mealy  automaton.  Since  the  Moore  automaton  is  a  particular  case  of  the 
Mealy  automata,  this  definition  is  applicable  in  full  measure  to  it  as 
well.  However,  in  practice  it  is  convenient  for  the  Moore  automata  to 
represent  the  events  not  by  the  property  of  the  output  at  the  instant 
of  the  application  of  the  last  input  signal  of  the  words  comprising 
the  events,  but  by  the  property  of  the  state  of  the  automaton  after 
the  arrival  at  the  input  of  the  automaton  of  a  word  of  a  particular 
event. 

In  other  words,  it  is  customary  to  consider  that  in  the  case  of 
the  Moore  automata  the  events  are  some  sets  of  the  automaton  states. 

On  the  strength  of  the  definition  of  the  Moore  automata,  this  method 
of  representing  the  events  is  completely  equivalent  to  the  method  of 
representing  the  events  by  the  sets  of  the  output  signals.  The  dif¬ 
ference  lies  only  in  that  with  the  representation  of  the  events  by  the 
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sets  of  the  automaton  states  the  empty  word  e  is  representable  (with 
the  aid  of  the  initial  state),  while  e  cannot  be  represented  by  any 
output  signal  (if,  of  course,  we  do  not  initiate  the  time  reckoning 
from  negative  instants  of  time). 

However,  we  have  agreed  above  not  to  consider  events  differing 
from  one  another  only  by  the  empty  word  e  as  different.  Therefore  the 
two  methods  of  representation  of  events  (states  or  output  signals) 
in  the  case  of  the  Moore  automata  are  actually  equivalent. 

Since  the  Moore  automata  can  be  considered  in  the  abstract  theory 
as  a  particular  case  of  the  Mealy  automata,  it  seems  natural  that  the 
class  of  events  represented  in  the  Moore  automata  is  more  scanty  than 
the  class  of  events  represented  in  the  Mealy  automata.  In  reality  this 
is  not  so. 

Let  us  assume  that  some  event  S  is  represented  in  some  Mealy 
automaton  A  by  the  set  M  of  its  output  signals.  It  is  not  difficult 
to  see  that  the  event  S  can  be  represented  by  some  set  of  internal 
states  of  the  Moore  automaton  B  (inducing  the  same  mapping  cp  as  the 
automaton  A)  which  was  constructed  in  the  preceding  section. 

We  recall  that  the  states  of  the  automaton  B  are  all  possible 
pairs  (a,  x),  composed  from  the  states  a  of  the  automaton  A  and  the 
letters  x  of  its  input  alphabet  X  and  also  the  initial  state  a0  of 
the  automaton  A.  The  shifted  output  function  \l  of  the  automaton  B  on 
the  initial  state  aQ  is  determined  arbitrarily,  while  on  the  state 
b  =  (a,  x)  it  is  determined  with  the  aid  of  the  relation  p.(b)  = 

=  X(a,  x)  where  X(a,  x)  is  the  output  function  of  automaton  A. 

If  h  =  gXj  is  an  arbitrary  nonempty  input  word,  then  in  the 
automaton  A  the  last  letter  of  the  corresponding  output  word  will 
obviously  be  y  =  X(aQg,  x^.).  The  automaton  B,  as  it  is  not  difficult 
to  see,  will  be  converted  by  the  word  h  from  the  initial  state  a  into 
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the  state  (aQg,  Xj). 

Thus,  all  the  nonempty  words  or  the  original  event  will  be  re¬ 
presented  in  automaton  B  by  the  set  K  of  all  possible  states  (a.,  x.  ) 

1  J 

for  which  the  relation  X(a^,  x^)  e  M  is  valid. 

§3.  ANALYSIS  OF  FINITE  AUTOMATA 

The  analysis  problem  amounts  to  the  determination  of  the  events 
represented  in  the  automaton  by  sets  of  output  signals  (in  the  case  of 
the  Mealy  automata)  or  by  sets  of  the  internal  states  (in  the  case  of 
the  Moore  automata).  Since  every  Moore  automaton  can  be  interpreted 
as  a  Mealy  automaton,  it  is  sufficient  to  learn  to  analyze  only  the 
Mealy  automata. 

We  shall  resolve  the  analysis  problem  only  for  the  case  of  finite 
Mealy  automata.  All  events  represented  in  such  automata  are  necessarily 
regular.  The  analysis  algorithm  is  applied  to  the  switching  and  output 
tables  of  the  automaton  being  analyzed  and  as  the  final  information 
gives  the  regular  expressions  for  the  events  representable  by  each  of 
the  output  signals  of  the  automaton.  An  event  which  is  representable 
by  an  arbitrary  set  of  output  signals  is  written,  then,  as  the  dis¬ 
junction  of  events  represented  by  the  individual  output  signals  com¬ 
posing  the  given  set. 

Let  us  consider  the  arbitrary  finite  Mealy  automaton  A  with  the 
set  of  internal  states  (a-^,  a2,  ...,  ap)  with  the  input  alphabet  X  = 

=  (x1,  x2,  ...,  xn)  and  the  output  alphabet  Y  =  (y^  yg,  ...,  ym). 

Considering  specified  the  initial  state  a^,  the  switching  func¬ 
tion  6(a^,  Xj)  and  the  output  function  X(a^,  x^)  of  the  automaton  A, 
we  shall  look  for  the  regular  expression  R  for  the  event  represented 
by  some  output  signal,  say  the  signal  y^.  We  write  out  the  internal 

states  of  the  automaton  a,,  a.  ,  ...,  a.  into  which  the  automaton  A 

1  J1  Jk 

transfers  from  the  initial  state  a1  by  means  of  the  sequential  initial 
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segments  e,  x.  ,  x.  x,  , 
11  X1  x2 


9  9  9  }  Xj  ^  4  9  9  9  9 


M  l  * 
X1  x2 


x.  of  some  Input  word  q. 

xk 


Inserting  the  symbols  for  the  states  obtained  into  the  v:ord  2  after 

the  corresponding  initial  segments,  we  transform  this  word  into  the 

new  word  q'  =  a,x.  a.  x.  ...  a.  x.  a.  ,  which  we  agree  to  call  the 

1  X1  "'I  12  Jk-1  1k  ^k 

path  corresponding  to  the  word  3.  Separating  in  the  given  path  the 
symbols  of  the  internal  states  a^,  we  obtain  the  input  v/ord  corre¬ 
sponding  to  the  given  path. 

We  shall  also  use  the  so-called  curtailed  paths,  obtained  from 

the  conventional  paths  by  the  dropping  of  the  extreme  right  symbol 

of  the  internal  state  a.  .  We  designate  the  path  corresponding  to  the 

Jk 

given  input  word  3  by  and  the  curtailed  path  by  _q" . 

It  is  evident  that  for  the  nonempty  input  word  ^  to  belong  to  the 
event  R^,  representable  in  the  automaton  A  by  the  output  signal  y.. , 
it  is  necessary  and  sufficient  that  the  curtailed  path  jj1  correspond¬ 
ing  to  the  word  2  terminate  with  the  pair  a.  x.  ,  for  which  the  out- 

Jk-1  k 

put  function  takes  the  value  equal  to  y^.  We  term  all  such  (curtail¬ 
ed  )  paths  of  the  type  y^,  or,  generalizing  (for  any  i),  representa¬ 
tive  type  paths. 

Paths  (uncurtailed)  corresponding  to  the  input  words  which  trans¬ 
form  the  automaton  from  some  state  a.  into  the  same  state  a.  are 

j  j 

termed  type  a^  paths ,  or  cyclic  type  paths.  If  in  some  path  c['  of  the 

cyclic  type  a.  there  are  no  symbols  of  any  internal  states  a,  ,  a,  , 

J 

...,  ak  then  we  shall  also  term  the  path  3'  a  path  of  the  aj[ak  > 

a.  ,  . . . ,  a.  ]  type  (here  the  symbols  in  the  square  brackets  are 
k2  Kr 

termed  forbidden). 

The  path  of  arbitrary  type  is  termed  simple  if  the  curtailed 
path  3"  corresponding  to  it  does  not  contain  two  identical  symbols  of 
the  internal  states.  Only  a  finite  number  of  different  simple  paths 
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exists  in  a  finite  automaton.  All  simple  paths  of  any  given  type  can 
be  found  directly  from  the  switching  table  or  (for  paths  of  the  re¬ 
presentative  type)  from  the  switching  and  output  tables  of  the  autom¬ 
aton. 


Let  us  construct  some  auxiliary  events  in  the  alphabet  Z  =  (x, , 
x2,  . ..,  xn,  a1#  a2>  ap),  whose  elements  are  curtailed  paths  in 

the  given  automaton  A.  We  define  the  event  S(y^)  of  type  y  as  an  event 
consisting  of  all  (curtailed)  paths  of  type  y^,  we  define  the  simple 
event  P(y1)  of  type  y.^  as  the  disjunction  of  all  simple  (curtailed) 
paths  of  type  y^.  We  shall  term  the  iteration  of  the  disjunction  of  all 
simple  paths  of  type  t  the  simple  event  P(t)  of  any  given  cyclic  type 
t  =  aj[a,t  ,  ak^, . . . ,  ak  ]  (j  =  1,  2,  . . . ,  p,  r  £  p  _  l)  (the  disjunc¬ 
tion  of  an  empty  set  of  paths  is  an  Impossible  event  whose  iteration 
coincides  with  the  empty  word  e)  and  shall  tern  the  event  consisting 
of  all  curtailed  paths  of  type  t  the  event  S(t)  of  type  t.  Finally, 
we  tern  the  iteration  of  the  portion  of  the  event  S(t)  containing 
words  with  only  a  single  occurrence  of  the  symbol  a^  the  conditionally 
simple  event  U(t)  of  type  t  =  aj[ak  ,  ak  ....  ak  ]. 

Let  there  be  given  some  set  (curtailed)  of  paths  of  type  y^  spe¬ 
cified  with  the  aid  of  the  regular  expression  Q.  Inserting  into  this 
regular  expression  ahead  of  each  occurrence  in  it  of  the  symbol  of 
the  internal  state  a^  the  regular  expression  of  the  event  S(a^)  of 

type  aj  or  of  the  event  of  type  a^Ca^.  >  ak  >  •••>  ak  ]  =  t,  we  obtain 

1  2  r 

a  new  regular  expression,  representing  as  before  only  paths  of  type 
y1.  We  term  this  operation  the  embedding  of  the  event  S(a^)  in  the 
event  Q. 

Now  let  cf'  be  an  arbitrary  (curtailed)  path  of  type  y^  The  first 
(left)  symbol  of  the  internal  state  occurring  in  this  path  will  be  the 

symbol  a1#  Let  us  isolate  also  the  last  (extreme  right)  occurrence  of 
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the  symbol  a.^  in  the  path  q"  :  qn  =  a.^  . . .  a^s,  where  the  word  _s  al¬ 
ready  does  not  contain  the  symbol  a^.  Then  the  path  q"  can  be  repre¬ 
sented  by  the  product  of  some  number  of  words  of  the  conditionally 
simple  event  U(a1)  of  type  a^  and  the  word  a-^s. 

In  the  word  j3  we  find  the  first  (left)  symbol  of  the  internal 

state:  s  =  x.  a..  . . . ;  after  finding  also  the  last  occurrence  of  this 

xk  Jk 

symbol  in  the  word  _s,  we  obtain  the  possibility  of  representing  the 

word  j3  by  the  product  of  the  letter  x,  ,  some  number  of  words  of  the 

conditionally  simple  event  of  type  a.  [a,]  and  some  word  a.  r  where 

Jk  1  Jk 

r  does  not  contain  the  two  symbols  of  the  internal  states  a,  and  a.  . 

1  Jk 

We  further  come  to  the  conclusion  that  the  path  q"  is  contained 
in  the  event  which  is  obtained  as  the  result  of  the  embedding  of  the 
conditionally  simple  event  of  type  a^  in  some  simple  path  of  type  y^ 
ahead  of  the  first  occurrence  in  it  of  the  letter  a^,  and  the  embed¬ 
ding  of  the  conditionally  simple  events  of  type  a,  [a-,],  a.  [a-.,  a.  ] 

Jk  1  Je  1  Jk 

ahead  of  the  succeeding  occurrences  in  this  path  of  the  symbols  of  the 

internal  states  a.  ,  a.  ,  . ..,  and  so  on. 

Jk  Je 

But  exactly  the  same  process,  obviously,  can  be  repeated  with  the 
words  of  the  conditionally  simple  events  which  were  separated  from 
the  original  path  q".  After  this  we  come  to  the  conclusion  that  the 
path  q"  of  type  y^  occurs  in  the  event  which  is  obtained  as  the  re¬ 
sult  of  the  embedding  in  the  simple  event  P(y^)  of  type  y^  not  the 

conditionally  simple,  but  the  ordinary  simple  events  of  types  a,,  a. 

Jk 

[a^],  ...  and  the  subsequent  embedding  in  the  paths  constituting  the 

embedded  simple  events  of  the  conditionally  simple  events:  for  the 

conditionally  simple  event  of  type  a,  _  the  types  a  [a,],  a  [a,, 

x  l  y  l 

ax3,  ...,  for  the  simple  event  of  type  a^  [a1 ]  -  the  types  a^a-^, 

k 

a  ,],  a  [a,a.  a  ],...  and  so  on. 

Jk  Jk 
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We  further  come  to  the  conclusion  that  ;ain  in  the  second  stage 
we  can  embed  not  the  conditionally  simple,  t-u  the  ordinary  simple 
events,  embedding,  in  turn  (in  the  third  stage)  n  the  words  consti¬ 
tuting  them  the  conditionally  simple  events  of  still  higher  (in  the 
sense  of  the  number  of  forbidden  letters)  cyclic  types. 

Increasing  the  number  of  stages  of  sequential  embeddings,  we 
finally  come  to  the  embedding  of  events  of  cyclic  types  in  which  all 
the  letters  except  one  are  forbidden.  Since  for  this  kind  of  types 
the  difference  between  the  conditionally  simple  and  the  ordinary  sim¬ 
ple  events  of  identical  type  no  longer  exists,  the  process  of  increas¬ 
ing  the  number  of  stages  and  of  new  embeddings  is  thereby  completed. 

As  a  result  we  come  to  the  conclusion  that  the  path  q"  occurs  in 
the  event  which  is  obtained  as  the  result  of  a  finite  number  of 
sequential  embeddings  (divided  into  a  number  of  stages)  into  the  sim¬ 
ple  event  of  type  y^  of  simple  events  of  ever  higher  and  higher  cyclic 
types.  In  view  of  the  arbitratlness  of  the  selection  of  the  path  q" , 
the  event  includes  in  itself  the  event  S(y1). 

At  the  same  time,  as  remarked  above,  the  process  of  embeddings 
similar  to  the  process  described  cannot  lead  to  an  event  containing 
the  paths  differing  from  the  y-^  type.  Consequently,  S1  =  S(y1),  and 
the  embedding  process  we  have  described  gives  a  regular  expression 
for  the  event  Sfy^,  consisting  of  all  paths  of  type  y^. 

Dropping  now  in  the  regular  expression  R1  all  the  symbols  of  the 
internal  states  (replacing  them  with  an  empty  word),  we  obtain  the 
regular  expression  R  which,  as  it  is  easy  to  see,  is  nothing  other 
that  the  regular  expression  for  the  sought  event,  represented  in  the 
automaton  A  by  the  output  signal  y^ 

We  have  proved  the  following  proposition. 


Theorem  1,  An  event  represented  in  an  arbitrary  finite  Mealy 
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automaton  (and,  consequently,  also  In  an  arbitrary  finite  Moore  autom¬ 
aton)  by  any  set  of  output  signals  is  necessarily  regular.  There  exist 
a  universal  constructive  technique  (algorithm  for  the  analysis  of  fi¬ 
nite  automata)  which  makes  it  possible  to  find  the  regular  expressions 
for  events  represented  by  the  sets  of  output  signals  in  an  arbitrary 
finite  automaton. 

The  described  algorithm  for  the  analysis  of  finite  automata  can 

be  given  a  form  which  is  more  convenient  for  practical  applications 

[22].  To  do  this  we  shall  work  not  with  the  events  in  the  set  of  paths, 

but  with  formal  expressions  termed  complexes. 

For  any  set  M  of  output  signals  of  a  given  finite  Mealy  automaton 

A  the  term  complex  of  type  M  (or  output  type  complex)  is  given  to  the 

disjunction  of  all  simple  curtailed  paths  terminating  with  the  pa  Its 

a^xi,  to  which  there  correspond  the  output  signals  contained  in  the 

set  M.  Complex  of  type  a.  [a..  ,  a.  ,  ...,  a.  )  (cyclic  type  complex)  is 

1  2  r 

the  term  for  the  formal  expression  obtained  as  the  result  of  joining 

with  the  disjunction  sign  all  simnle  paths  of  type  a. [a.  ,  a,  ,  ..., 

1  X1  x2 

...,  a.  ]  with  the  letter  ^stricken  and  enclosing  the  resulting  for- 

mal  polynomial  in  the  iteration  brackets  (a.,  a.  ,  . . . ,  a.  are  any 

l  ir 

pairwise  different  internal  states  of  the  automaton  and  0  <  r  <  p  —  1 
where  jd  is  the  number  of  internal  states  of  the  automaton  A). 

First  step  of  the  analysis  algorithm.  From  the  switching  and  out¬ 
put  tables  of  the  automaton  and  the  given  (representative)  set  of 
output  signals,  by  means  of  sorting  of  all  possible  variants  of  sim¬ 
ple  paths  we  find  the  complex  K(M)  of  type  M  and  the  complexes  K(ai) 
for  all  the  internal  states  a^  of  the  automaton  A. 

Second  step.  From  the  complexes  K(a1),  by  exclusion  of  unneces¬ 
sary  terms  in  the  iteration  brackets  we  find  the  complexes  of  higher 
cyclic  types  a,ra  ,  a,  >  •••>  a-<  J  (r  <  !)•  which  are  necessary  for 

11  x2  r 
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the  further  constructions. 


Third  step.  Starting  from  the  complex  of  type  M,  we  sequentially 

replace  all  symbols  of  the  internal  states  a^  by  complexes  of  cyclic 

type  until  we  obtain  an  expression  R  not  containing  a  single  one  of 

the  internal  state  symbols.  The  replacement  rule  can  be  formulated  as: 

If  the  path  a,x.  a.  x.  a.  ...  occurs  in  the  complex  of  the  type 
1  1i  <Ji  x2  <J 2 

M  output,  then  the  letter  a.^  is  replaced  by  the  complex  of  type  a^, 

the  letter  a.  is  replaced  by  a  complex  of  the  type  a.  [a,],  the 
J1  J1  1 


letter  a.  by  a  complex  of  the  type  a.  [a-,,  a.  ]  and  so  on. 
Jg  ^2  X  ^1 


If  the 


term  x.  a.  x.  a.  ...  occurs  in  the  complex  of  the  type  a.  [N],  where 
X1  ^1  x2  "2  1 

N  is  the  set  (possible  empty)  of  internal  states  differing  from  a^, 

then  letter  a.  is  replaced  by  a  complex  of  the  type  a.  [a,,  N],  the 
J1  J1  1 
letter  a.  by  the  complex  of  the  type  a.  [a,,  a.,  N]  and  so  on. 

^2  ^2  x 


In  the  third  step,  as  a  result  of  the  application  of  the  replace¬ 
ment  rule  a  finite  number  of  times,  we  obtain  the  desired  regular  ex¬ 
pression  R  for  the  event  represented  in  the  automaton  A  by  the  set  M 
of  output  signals. 

The  following  proposition  follows  directly  from  the  described 
algorithm. 

Theorem  2.  Every  event  represented  in  a  finite  Mealy  automaton 
(or  Moore  automaton)  having  n  internal  states  admits  a  regular  ex¬ 
pression  whose  cyclic  depth  not  does  exceed  n. 

As  an  example  let  us  find  the  regular  expression  for  the  event  S 
represented  by  the  output  signal  v  in  the  automaton  whose  switching 
and  output  tables  were  described  in  §1  of  the  present  chapter  (its 
graph  is  shown  in  Fig.  7)* 

We  find  the  complex  K(v)  of  type  v  directly  from  the  tables 

K  (v)  =  iy  V  iy»x  V  ,****« 


and  the  complexes  of  types  1,  2  and  3 

K( l)=e.  K  (2)  —  [y\J  xty),  K  (3)  -  |x  V  y#l 

We  write  out  some  complexes  of  the  higher  cyclic  types: 

,/C (2(1))  =  K (2);  /C (3(1.2])  =  {x}; 

/C(2[3])  =/C(2(1.3])=  \y\\  /C(3(l))  =  /C(3). 

Designating  the  operation  of  embedding  of  complexes  by  an  ar¬ 
row,  we  obtain  the  following  sequence  of  embeddings: 

K(v)^y\J  yK(3[l))x  V  xK(2(l])x/C(3|l.  2J)x  - 
=  y  V  y  {x  y*x)  X  \J  x  \y  \J  x,y\  x  \x\  X  -+  y  \J  y  [JC  v 
V  yK (2 11,  3])x;  x  \/  x  {y\/  xK (3(1,  2\)y)x  |x]  x  =» 

=  y  V  y  {x  v  y  ly\ x)  X  v  X  ,y  v  X  [x)y\ x  |x)  x. 

The  last  of  the  regular  expressions  obtained  is  then  the  sought 
regular  expression  for  the  event  S.  It  admits  transformation  and  sim¬ 
plification  with  the  use  of  the  relations  existing  in  event  algebra. 

§4.  ABSTRACT  SYNTHESIS  OF  FINITE  AUTOMATA 

The  abstract  synthesis  problem  is  the  opposite  of  that  of  the 
analysis  of  finite  automata;  it  is  necessary  to  find  an  effective 
method  which  will  make  It  possible  to  find  from  the  regular  expressions 
for  the  events  the  switching  and  output  tables  of  some  finite  autom¬ 
aton  which  represents  these  events. 

The  problem  of  the  synthesis  of  Moore  automata  is  more  general 
than  that  of  the  synthesis  of  the  Mealy  automata:  since  every  Moore 
automaton  can  be  interpreted  as  a  Mealy  automaton,  by  learning  to 
synthesize  the  Moore  automaton  we  also  learn  to  synthesize  the  Mealy 
automaton  as  well.  Therefore  we  shall  solve  the  problem  of  synthesis 
of  the  Moore  automaton. 

Let  there  be  given  in  some  finite  alphabet  X  =  (x^,  x^,  . . . ,  xn) 
the  2  regular  expressions  R^,  Rg,  . ..,  Rp.  Let  us  number  all  occur¬ 
rences  of  the  letters  of  the  alphabet  X  in  the  expressions  R^,  Rg, 

...,  R  by  the  sequential  natural  numbers,  which  hereafter  we  shall 
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term  the  subscripts  of  the  corresponding  places  of  these  expressions. 
We  emphasize  particularly  that  the  various  occurrences  of  the  same 
letter  of  the  alphabet  X  will  thus  have  different  subscripts. 

In  the  development  of  the  regular  expression  into  a  word,  each 
of  the  sequentially  written  out  letters  of  this  word  is  identified 
with  a  particular  occurrence  of  the  corresponding  letter  in  the  ex¬ 
pression  being  developed.  We  agree  to  consider  that  in  this  identifi¬ 
cation  we  enter  particular  places  of  the  regular  expression,  namely: 
in  the  identification  of  the  last  written  letter  with  the  occurrence 
numbered  with  the  subscript  J,  we  shall  consider  that  we  are  in  the 
J-th  place  of  the  corresponding  regular  expression.  We  say  that  the 
J-th  place  of  the  regular  expression  x^-follows  after  the  i-th  place 
if  after  identification  of  the  last  letter  of  some  word  3  with  the 
occurrence  having  the  subscript  i  we  can  identify  the  last  letter  of 
the  word  qx^  with  the  occurrence  having  the  subscript  JL  In  each  reg¬ 
ular  expression  there  is  also  identified  the  initial  place,  to  which 
there  is  assigned  the  subscript  0  (identical  for  all  given  regular  ex¬ 
pressions).  If,  In  the  process  of  identification,  the  first  letter  x^ 
of  some  word  is  identified  with  some  occurrence  of  it  in  the  regular 
expression,  having  the  subscript  then  we  consider  that  the  J-th 
place  x^-follows  after  the  zero  (initial)  place  (common  for  all  given 
regular  expressions). 

Finally,  if  the  membership  of  some  word  £  to  the  event  with  the 
regular  expression  R  Is  established  as  the  result  of  the  identifica¬ 
tion  of  the  last  letter  of  the  word  jg  with  its  occurrence  in  R,  having 
the  subscript  J,  then  the  j-th  place  of  the  expression  R  is  termed  a 
final  place  of  this  expression. 

For  any  finite  set  of  regular  expressions  R^,  ...,  Rp  in  the 

same  alphabet  Xf  using  the  order  of  operations  in  the  algebra  of  events 
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defined  above,  it  is  not  difficult  to  compose  the  place  sequence 
table.  The  rows  of  this  table  are  designated  by  the  letters  of  the 
alphabet  X  and  the  columns  by  the  subscripts  of  all  the  places  of  the 
expressions  R^,  Rp,  . . . ,  Rp.  At  the  intersection  of  the  x^-th  row  with 
the  J-th  column  of  the  sequence  table  there  are  written  out  the  sub¬ 
scripts  of  all  the  places  which  x^-follow  after  the  J-th  place.  If 
there  are  no  such  subscripts,  in  the  corresponding  place  of  the  table 
we  place  a  special  symbol  designating  an  empty  set  of  subscripts.  We 
agree  to  use  an  asterisk  as  this  symbol. 

Let  us  construct  the  Moore  automaton  A  whose  internal  states  will 
be  all  possible  subsets  of  place  subscripts  in  the  given  regular  ex¬ 
pressions  R^,  Rp,  ...,  Rp  (including  the  empty  subset).  The  switching 
function  6  of  this  automaton  is  constructed  as  follows:  for  any  state 
a^  of  the  automaton  A  (set  of  place  subscripts  of  the  given  events) 
and  for  any  letter  x^  of  the  input  alphabet,  the  state  =  6(a^,  xj) 
is  defined  as  the  set  of  subscripts  of  all  places  which  Xj-follow  at 
least  one  of  the  places  whose  subscripts  occur  in  a^. 

The  shifted  output  function  |i  of  the  Moore  automaton  A  is  con¬ 
structed  for  the  output  alphabet  Y  consisting  of  all  possible  subsets 
(including  the  empty  subset)  of  the  set  of  all  symbols  R^,  R^,  ...,  Rp 
of  the  given  regular  expressions.  For  any  state  (set  of  subscripts) 
of  the  automaton  A,  we  select  as  ^(a^)  the  set  of  all  those  regular 
expressions  R^,  R^,  . . . ,  Rp,  for  which  at  least  one  of  the  subscripts 
occurring  in  a^  is  the  subscript  of  a  final  place. 

We  have  constructed  some  finite  Moore  automaton  A.  From  the  meth¬ 
od  of  construction  of  its  switching  and  output  functions  it  follows 
directly  that  it  represents  (with  selection  of  0  as  the  initial  state) 
each  of  the  given  events  Rp,  . ..,  Rp  and  the  complement  S  of  their 
union.  The  event  R^  is  represented  by  the  set  of  all  those  output  sig- 
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nals  (sets  of  symbols  R^,  R^  . ..,  Rp)>  in  whose  composition  there 
occurs  the  symbol  Ri(i  =  1,  2,  . ..,  p).  The  event  S  is  obviously  the 
empty  set  of  the  symbols  R^,  Rg,  . ..,  Rp. 

As  a  result  we  have  proved  the  following  proposition. 

Theorem  1.  Any  regular  event  can  be  represented  in  a  finite  autom¬ 
aton.  There  exists  a  single  constructive  technique  (synthesis  algo¬ 
rithm)  which  makes  it  possible  from  any  finite  set  of  regular  events 
given  by  regular  expressions  to  construct  the  finite  Moore  or  Mealy 
automata  representing  these  events. 

Combining  the  proved  theorem  with  the  result  obtained  in  §2,  we 
obtain  the  following  result. 

Theorem  2.  Regular  events  and  only  regular  events  are  represent¬ 
able  in  finite  automata. 

A  similar  result  for  automata  of  a  special  form  (neural  networks) 
and  for  a  more  awkward  form  of  definition  of  the  regular  event  has 
been  obtained  previously  by  Kleene  [40], 

The  following  proposition  also  follows  from  the  results  obtained 
above. 

Theorem  3«  The  intersection  of  two  (and  therefore  of  any  finite 
number  as  well)  regular  events  and  the  complement  (in  the  set  of  all 
words  in  the  basic  alphabet)  of  any  regular  event  are  also  regular 
events. 

The  algorithm  described  above  for  the  synthesis  of  finite  autom¬ 
ata  also  admits  the  following  Interpretation  which  is  more  convenient 
for  practical  purposes  [21]. 

Let  there  be  given  the  jo  regular  expressions  R,,  •••»  R  in 

—  —  P 

the  arbitrary  finite  alphabet  X  =  (x^  x2,  . ..,  x  ).  If  any  of  the  ex¬ 
pressions  R^  is  the  disjunction  of  several  terms,  then  we  can  without 
losing  generality  consider  that  it  is  enclosed  in  ordinary  (nonitera- 
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tive)  brackets.  Specially  introduced  separation  symbols  (vertical  bars) 
standing  between  any  two  symbols  (letters,  brackets,  disjunction  signs) 
of  these  expressions  and  also  standing  to  the  left  of  an  expression 
(initial  place)  and  to  the  right  of  an  expression  (final  place)  will 
be  termed  places  in  the  expressions  R^,  ...,  Rp. 

Places  having  a  letter  of  the  basic  alphabet  X  standing  directly 
on  their  left  and  the  initial  place  are  termed  basic  places;  the  places 
having  a  letter  of  the  alphabet  X  standing  directly  on  their  right  are 
termed  prebasic.  The  initial  places  of  all  the  expressions  R^,  R^, 

...,  R  are  identified  with  one  another  in  one  single  initial  place. 

r 

We  designate  all  the  basic  places  with  different  nonnegativc  whole 
numbers  —  the  basic  subscripts  of  these  places.  Here  the  initial  place* 
takes  the  basic  subscript  0. 

The  operation  of  each  basic  subscript  extends  not  only  to  the  cor¬ 
responding  place,  but  also  to  the  places  (basic  and  nonbasic)  which 
are  subordinate  to  it.  The  place  subordination  rule  expresses  the  or¬ 
der  of  the  operations  in  the  algebra  of  events.  It  is  defined  by  the 
following  subscript  extension  rule. 

The  place  subscripts  ahead  of  any  brackets  (iterative  or  conven¬ 
tional)  extend  to  the  initial  places  of  all  the  terms  standing  inside 
these  brackets.  The  subscripts  of  the  final  place  of  any  term  enclosed 
in  brackets  extend  to  the  place  directly  following  these  brackets. 

Place  subscripts  directly  preceding  iterative  brackets  or  symbols  of 
an  empty  word  extend  to  the  place  directly  following  these  brackets 
(respectively  after  the  given  symbol  e).  Finally,  place  subscripts 
following  directly  after  iterative  brackets  extend  to  the  initial 
places  of  all  the  terms  enclosed  in  these  brackets. 

All  the  subscripts  appearing  on  the  basic  and  nonbasic  places  as 
the  result  of  the  application  of  the  rule  just  formulated  are  termed 


nonbaslc.  In  this  case  the  rule  itself  must  be  applied  until  its  ap 
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plication  no  longer  leads  to  the  appearance  of  now  subscripts  on  any¬ 
place. 

The  indexing  of  the  given  regular  expressions,  the  labeling  of  the 
places  and  the  extension  of  the  subscripts  according  to  the  formulateu 
rule  constitute  the  first  step  of  the  synthesis  algorithm. 

The  second  step  consists  in  the  construction  of  the  switching 
table  of  the  sought  automaton  A.  Here  the  input  signals  are  the  letters 
of  the  original  alphabet  X,  and  the  internal  states  of  the  automaton 
are  identified  with  the  sets  of  the  basic  subscripts.  Let  us  agree  for 
definiteness  to  denote  these  sets  by  the  disjunction  of  the  component 
subscripts,  and  the  empty  set  of  subscripts  by  an  asterisk. 

The  rule  for  the  construction  of  the  automaton  switching  table 
amounts  to  the  following. 

The  single-element  set  consisting  of  the  subscript  0  serves  as 
the  Initial  state  of  the  automaton  A.  The  state  a1  Is  transformed  by 
the  Input  signal  x^  into  the  state  ay  consisting  of  the  basic  sub¬ 
scripts  of  all  the  basic  places,  separated  by  the  letter  x^.  from  the 
prebasic  places  directly  preceding  them,  whose  subscripts  (basic  or 
nonbasic)  contain  at  least  one  subscript  from  the  number  of  subscripts 
occurring  in  the  state  a^. 

In  practical  application  of  the  formulated  rule  it  is  convenient 
to  separate  the  basic  subscripts,  placing  them  above  a  horizontal  line 
specially  drawn  for  this  purpose.  It  Is  also  advisable  to  separate  all 
the  subscripts  (basic  and  nonbasic)  of  the  prebasic  places,  for  example 
enclosing  them  in  a  rectangular  frame.  In  the  construction  of  the 
switching  table  it  is  sufficient  to  limit  ourselves  to  only  the  states 
which  actulaly  appear  in  the  process  of  the  construction  of  the  table, 
starting  from  the  initial  (zero)  state. 
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The  third  step  of  the  synthesis  consists  in  the  construction  of 
the  shifted  output  table,  or,  what  Is  the  same,  In  the  labeling  of  the 
states  of  the  automaton  A  with  the  output  signals  corresponding  to 
them.  As  the  output  signals  we  select  the  various  sets  of  the  symbols 
of  the  initial  regular  expressions  (including  the  empty  set).  The  state 
labeling  rule  consists  in  the  following. 

The  state  a^  is  labeled  with  the  set  of  those  symbols  of  the  ex¬ 
pressions  R^,  R^,  ...,  Rp,  whose  final  place  subscripts  (basic  and 
nonbasic)  include  at  least  one  subscript  from  a^. 

The  states  labeled  with  the  empty  set  of  symbols  are  also  termed 
unlabeled. 

We  note  that  the  constructed  Moore  automaton  represents  the  event 
R1  by  the  set  of  all  those  output  signals  which  contain  the  symbol 
R^  (i  =  1*  2,  ...,  p). 

The  fourth  step  of  the  synthesis  algorithm  consists  in  the  rede¬ 
signation  of  the  internal  states  and  the  output  signals  to  obtain  a 
simpler  writing  of  the  switching  table  and  the  shifted  output  table. 
Here  the  internal  states  are  most  frequently  numbered  with  the  se¬ 
quential  natural  numbers  1,  2,  ...,  k. 

Finally,  the  fifth  step  of  the  synthesis  algorithm  is  used  when 
we  are  required  to  synthesize  a  Mealy  automaton  rather  than  a  Muore 
automaton.  It  amounts  to  the  construction  of  the  conventional  (un¬ 
shifted)  output  table.  As  follows  from  §1  of  the  present  chapter,  for 
this  it  is  sufficient  to  substitute  in  the  switching  tabic  in  place  of 
the  internal  states  the  output  signals  which  label  them. 

In  the  solution  of  practical  problems  which  arise  in  the  synthesis 
of  automata,  it  is  frequently  convenient  to  assign  identical  basic  sub¬ 
scripts  to  certain  basic  places,  thereby  identifying  these  places. 

Such  an  identification  is  possible  if  to  the  identified  places  the  re 
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are  subordinated  identical  sets  of  prebasic  and  final  places  (places 
satisfying  this  conditions  are  termed  similar) . 


Another  case  when  it  is  possible  to  identify  places  relates  to 
the  so-called  corresponding  places.  All  those  places  in  the  various 
regular  expressions  Rp  Rg,  Rp  or  in  the  different  terns  enciot>^ 

in  the  same  brackets,  to  which  identical  paths  (sets  of  words)  lead 
from  the  initial  place  or  correspondingly  from  the  place  directly  pre¬ 
ceding  the  brackets,  are  termed  corresponding. 

In  the  use  c n  the  synthesis  algorithm  described  above  the  basic 
subscripts  of  the  corresponding  places  always  occur  together  in  the 
states  of  the  automaton  being  synthesized.  It  Is  precisely  this  that 
makes  possible  their  identical  indexing.  Substantiation  of  the  possi¬ 
bility  of  identifying  similar  places  results  from  the  minimization 
algorithm  a^sc^i^ed  In  §5  of  the  present  chapter. 

We  note  that  the  places  should  be  identified  only  with  respect  to 
one  Of  the  criteria  (similarity  or  correspondence),  since  simultaneous 
identification  with  respect  to  both  criteria  can  lead  to  errors.  In 
particular,  since  the  initial  places  are  actually  identified  with  re¬ 
spect  to  the  correspondence  criterion,  we  cannot,  generally  speaking, 
with  the  existence  of  more  than  one  event  Identify  the  initial  place 
in  any  event  with  another  place  using  the  similarity  criterion. 

The  validity  of  the  following  proposition  results  directly  from 
the  algorithm  described. 

Theorem  4.  Events  given  by  the  regular  expressions  Rp  Rg,  ...,  Rp 
in  some  finite  alphabet  X  can  be  represented  in  a  finite  automaton 

y.  I  1 

(Mealy  or  Moore)  having  no  more  than  2n  Internal  states,  where  n  is 
the  total  number  of  occurrences  of  the  letters  of  the  alphabet  X  in 
the  expressions  Rp  Rg,  . ..,  Rp. 

Let  us  consider  what  changes  need  to  be  made  in  the  synthesis 
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algorithm  when.  In  addition  to  the  initial  events  R^,  Rg,  R  , 

there  is  also  given  the  forbidden  region  S  in  the  alphabet  X  =  (x^, 

x2 . * 

The  forbidden  region  can  be  specified  either  with  the  aid  of  some 
regular  expression,  or  as  an  ensemble  of  words  in  the  alphabet  X  not 
occurring  in  even  one  of  the  events  R^,  Rg,  . ..,  Rp.  These  two  methods 
are  essentially  equivalent  to  one  another,  since  we  can  transfer  from 
the  first  method  of  specification  to  theother  and  vice  versa. 

The  forbidden  region  S  by  its  very  definition  permits  right  mul¬ 
tiplication  by  the  ensemble  F  of  all  words  (including  the  empty  word) 
in  the  alphabet  X:  SF  =  S.  Therefore,  with  specification  of  the  for- 
bidden  region  by  the  regular  expression  R  we  can,  without  losing  gen¬ 
erality,  assume  that  the  expression  R  has  the  form 

R  =  Rt{x,  V*,V...  V*„). 

The  synthesis  algorithm  described  above  gives  the  solution  of  the 
problem  with  the  existence  of  a  forbidden  region.  However,  in  this 
case  many  transitions  in  ther  synthesized  automaton  are  redundant  In 
the  sense  that  they  will  never  be  used  in  actual  operation  of  the 
automaton.  The  problem  consists  in  the  determination  of  all  such 
switchings  and  the  construction  in  place  of  the  conventional  (com¬ 
pletely  determinate)  automaton  a  partial  automaton  in  whose  switching 
and  output  tables  dashes  stand  in  the  places  of  the  forbidden  tran¬ 
sitions.  The  conversion  to  the  partial  automaton  gives  additional 
possibilities  for  subsequent  simplification  of  the  automaton. 

This  problem  in  the  case  of  the  specification  of  the  forbidden 
region  as  the  ensemble  of  words  in  the  alphabet  X  which  do  not  occur 
in  even  one  of  the  given  regular  expressions  R^,  Rg,  ...,  Rp,  is 
solved  by  a  quite  obvious  method.  After  the  performance  of  the  synthe¬ 
sis  algorithm  described  above,  the  output  signal  designated  by  the 
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empty  set  of  symbols  R^,  Rg,  . ..,  R^,  will  correspond  to  the  appear¬ 
ance  of  the  forbidden  input  word.  Consequently,  it  is  sufficient  to 
replace  this  output  signal  in  the  output  table  by  a  dash  and  to  put 
a  dash  in  all  the  places  of  the  switching  table  corresponding  to  the 
appearance  of  the  forbidden  output  signal  (with  superpositioning  -of 
the  output  table  on  the  switching  table  the  places  labeled  with  a  dash 
in  the  two  tables  must  coincide). 

In  the  case  of  the  specification  of  the  forbidden  region  by  the 
regular  expression  S,  we  apply  the  usual  synthesis  algorithm  to  the 
expressions  S,  R^,  Rg,  ...,  R^  and  consider  as  forbidden  all  outputs 
designated  by  the  sets  which  include  the  symbol  S.  Forbidden  outputs 
in  the  output  table  are  replaced  by  dashes,  which  are  transferred  to 
the  switching  table  using  the  method  described  above.  It  is  clear  that 
this  technique  actually  leads  to  the  solution  of  the  posed  problem. 

In  this  case  we  should  consider  that  the  expression  has  the  form 

•S  =  S  i  (jCj  V  •**  V  •  •  •  V  • 

If  the  initially  given  expression  for  the  forbidden  region  did 
not  satisfy  this  condition.  It  must  be  replaced  by  the  expression 

R  =  5  {*,  V  xt  V  •  •  -V  *„!• 

As  an  example,  let  us  consider  the  synthesis  of  the  partial  Mealy 
automaton  representing  the  event  R  =  x  (y},  with  the  existence  of  the 
forbidden  region  $=j/*{x\/y}.  In  the  first  step  we  perform  the 
labeling  of  the  places,  the  indexing  and  the  extension  of  the  indexes 
in  the  expressions  R  and  S,  using  the  possibility  of  Identification  of 
similar  places : 


y  n 

S  =  |S'I*|I 

x\V 

|0  1 

1 

i 

|0|  |2|  3 

3 

3 

1 

11 

I3|  |3  3 

Performing  the  second  and  third  steps  of  the  algorithm,  we  come 
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to  the  labeled  switching  table  of  the  Moore  automaton 


x 

y 


-R - 5 

0  12*3 
1*3*3  * 
2  1**3 


In  the  fourth  step  we  introduce  the  redesignation  0  -*  1,  1  -*•  2, 
2  3 »  *“*  4,  3  (  )  "*■  u,  R  v,  S  -*•  vj  (here  the  brackets  (  )  de¬ 

signate  the  empty  set  of  symbols  R  and  S).  After  this  obtain  the 
labeled  switching  table 

U  V  u  u  w 
~i~2~3  4  5 
x  2  4  5  4  5  * 
y .  3  2  4  4  5.  , 

Completing  in  the  fifth  step  the  conversion  to  the  Mealy  autom¬ 
aton,  we  obtain  the  switching  and  output  tables 


[  1  2  3  4  5  |  1  2  3  4  5 


*12  4  5  4  5  5 

* 

V  u  w  u  w 

y\  3  2  4  4  5 

y 

U  V  U  U  'JJ 

Finally,  we  perform  the  conversion  to  the  partial  automaton.  In 
the  present  case  we  shall  consider  the  forbidden  region  to  be  the  sig¬ 
nal  w.  Then  the  switching  and  output  tables  will  have  the  form 


1  2  3  4  5 

1  2  3  4  5 

X 

2  4  -  4  —  ;  * 

v  u  —  u  — 

y 

3  2  4  4  —  y 

U  V  u  u  — 

However,  state  5  is  redundant,  since  the  automaton  can  never  con¬ 
vert  into  it  starting  from  the  initial  state  1.  Discarding  this  redun¬ 
dant  state,  we  come  to  the  final  switching  and  output  tables 


|  1  2  3  4  | 

12  3  4 

* 

2  4  —  4  :  * 

V  u  —  u 

y 

3  2  4  4  y 

U  V  u  u 

§5.  MINIMIZATION  OF  ABSTRACT  AUTOMATA 

As  indicated  above,  we  consider  the  abstract  automaton  as  a  de¬ 
vice  for  the  realization  of  automaton  mappings.  In  connection  with 
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this  it  is  natural  not  to  differentiate  automata  which  are  equivalent 
to  one  another,  i.e.,  automata  which  induce  identical  mappings. 

The  primary  task  resolved  in  the  present  section  is  that  of  the 
minimization  of  automata,  i.e.,  the  problem  of  finding  the  automaton 
with  the  minimal  number  of  states  in  the  class  of  all  automata  equi¬ 
valent  to  the  given  one.  The  method  presented  for  the  solution  of  this 
problem  is  a  development  of  the  ideas  of  Mealy  [55]  >  Aufenkamp  and 
Hohn  [3,4]. 

Let  a  and  b  be  two  states  of  the  same  or  of  two  different  Mealy 
automata  having  common  input  and  output  alphabets.  If  for  any  input 
signal  x.^  the  output  signals  determined  by  the  pairs  (a^^)  and  (b,x1) 
are  identical,  then  the  states  a  and  b  are  termed  1-equlvalent  states. 

If  1-equivc. lent  states  are  transformed  by  any  input  signal  x^  in¬ 
to  states  which  also  are  1-equivalent  to  one  another,  then  they  are 
termed  2- equivalent .  If  2-equivalent  states  are  transformed  by  any  in¬ 
put  signal  into  states  which  are  2-equivalent  to  one  another,  then 
they  are  termed  3-equivalent,  etc. 

It  is  easy  to  see  that  in  the  case  of  the  application  to  them 
of  any  input  word  1  the  i-equivalent  states  give  rise  to  identical 
output  words  i  =-  1,  2,  ...). 

The  I-equivalency  relation  for  any  i  =  1,  2,  ...  has  the  prop¬ 
erties  of  reflexivity,  symmetricity  and  transitivity.  This  implies 
that  the  set  of  all  internal  states  of  a  given  Mealy  automaton  Is 
partitioned  by  this  relation  into  disjoint  classes  of  states  which 
are  i-equlvalent  to  one  another.  We  term  such  classes  i-equivalent 
classes  or  1-classes. 

States  which  are  i-equivalent  for  all  i  =  1,  2,  ...,  are  termed 
equivalent  states,  and  the  classes  defined  by  the  equivalency  ratio 
are  termed  equivalent  classes  or  “-classes. 
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The  validity  of  the  following  proposition  follows  directly  from 
the  definition  of  the  states  which  are  equivalent  to  one  another. 

Theorem  1.  Two  states  of  the  same  or  of  two  different  Mealy  autom¬ 
ata  are  equivalent  to  one  another  if  and  only  if  the  application  to 
them  of  any  input  word  causes  the  appearance  of  the  same  output  word. 

This  proposition  makes  it  possible  to  formulate  the  following  re¬ 
sult  as  well. 

Theorem  2.  Two  Mealy  automata  are  equivalent  to  one  another  (in 
the  sense  of  the  coincidence  of  the  automaton  representations  which 
they  induce)  If  and  only  if  their  initial  states  are  equivalent. 

The  application  of  the  same  input  word  jg  to  two  equivalent  states 
a  and  b  transforms  them  anew  into  the  equivalent  states  ap  and  bp. 

Since  equivalent  states  are  at  the  same  time  1-equivalent,  then  for 
any  input  signal  x^  the  pairs  (a,  x^)  and  (b,  x^  define  identical  out¬ 
put  signals. 

Thus,  for  every  Mealy  automaton  A  we  can  construct  the  new  Mealy 
automaton  B  with  the  same  Input  and  output  alphabets  as  automaton  A, 
taking  as  the  set  of  Its  internal  states  the  set  of  all  equivalence 
classes  of  the  automaton  A.  The  transitions  and  the  outputs  in  autom¬ 
aton  B  are  determined  as  follows  :  the  equivalence  class  is  trans¬ 
formed  by  the  input  signal  x1  Into  the  equivalence  class  K2  contain¬ 
ing  the  state  a.x  ,  where  a.  is  any  state  contained  in  the  class  K,. 

J  1  J 

To  the  pair  K^x^  there  is  associated  in  this  case  the  output  signal 
determined  by  the  pair  a jX^.  We  shall  term  the  automaton  thus  con¬ 
structed  the  canonical  minimization  of  the  Mealy  automaton  A. 

The  validity  of  the  following  proposition  follows  from  the  meth¬ 
od  of  construction  of  automaton  B  and  from  proposition  1. 

Theorem  3.  In  the  canonical  minimization  of  any  Mealy  automaton, 
any  two  different  internal  states  are  not  equivalent  to  one  another. 
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For  the  realization  of  automaton  correspondence  it  is  sufficient 


to  limit  ourselves  to  the  consideration  only  of  the  so-called  con¬ 
nected  automata,  i.e.,  those  automata  in  which  every  state  is  attain¬ 
able,  or,  in  other  words,  which  as  the  result  of  the  application  of  a 
suitable  input  word  can  be  transformed  from  the  initial  state  into  any 
other  internal  state. 

Actually,  if  the  mapping  cp  is  Induced  by  the  unconnected  autom¬ 
aton  A,  then  the  attainable  states  of  the  automaton  A  form  the  new 
automaton  B  which  is  connected  and  induces  the  same  mapping  <p.  We 
note  that  as  a  result  of  the  application  of  the  synthesis  algorithm 
described  in  the  preceding  section,  connected  automata  are  always 
obtained. 

Let  us  consider  some  connected  Mealy  automaton  A  which  induces 
the  specified  automaton  mapping  <p.  The  canonical  minimization  B  of 
the  automaton  A  is  also  connected  and  realizes  the  same  mapping  q> 

(with  selection  as  the  initial  state  of  the  equivalence  class  KQ  con¬ 
taining  the  initial  state  of  the  automaton  A). 

Let  D  be  any  automaton  realizing  the  same  correspondence  and  let 
dQ  be  its  initial  state.  On  the  strength  of  the  connectedness  of  the 
automaton  B,  for  its  every  state  Ki  we  can  select  the  input  word  p^ 
such  that  KQp1  =  (i  e  M).  Let  us  construct  the  mapping  of  the  set 
of  states  of  automaton  B  into  the  set  of  states  of  automaton  D,  set¬ 
ting  tK^)  =  d^  (i  e  M). 

It  is  clear  that  tne  initial  states  KQ  and  dQ  of  the  automata  B 
of  the  automata  B  and  D  are  equivalent  to  one  another.  But  then  the 
states  and  d.^  =  are  also  equivalent  for  any  i  e  M.  If  ^(K^)  = 

=  i^(Kj),  then  this  implies  equivalence  of  the  states  and  Kj.  As 
the  result  of  proposition  3  this  means  that  =  Kj.  Thus,  the  corre¬ 
spondence  $  is  one-to-one,  which  implies  the  validity  of  the  follow- 
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ing  proposition. 


Theorem  4.  The  canonical  minimization  of  a  connected  Mealy  autom¬ 
aton  which  induces  any  given  automaton  mapping  9  is  the  automaton  hav¬ 
ing  the  smallest  possible  number  of  internal  states  along  all  Mealy 
automata  which  Induce  the  same  mapping  9  (i.e.,  among  all  automata 
equivalent  to  the  automaton  A). 

This  statement  completely  resolves  the  problem  of  the  minimiza¬ 
tion  of  the  Mealy  automata  under  the  condition  that  there  exists  the 
constructive  technique  for  the  construction  of  the  equivalency  classes 
for  any  given  (connected)  Mealy  automaton.  Such  a  technique  has  been 
suggested  by  Aufenkamp  and  Hohn  [4]  for  the  scale  of  finite  Mealy 
automata.  It  is  based  on  the  following  easily  proved  proposition. 

Theorem  5*  If  for  some  _i  the  partition  of  the  states  of  the  autom¬ 
aton  into  (i  4  l)-classes  coincides  with  the  partition  into  l-classes, 
then  it  is  also  the  partition  into  ^-classes. 

Actually,  if  any  pair  (a^,  a^)  of  1-equivalent  states  is  also 
(i  4  l)-equivalent,  then  the  states  a^  and  aj  are  transformed  by  any 
input  signal  xr  into  states  which  are  i-equivalent  to  one  another.  But 
then  they  are  transformed  by  this  same  signal  also  into  states  which 
are  (i  +  l)-equivalent  to  one  another.  Consequently,  the  states  a^  and 
aj  are  not  only  (i  +  1 )-equivalent ,  but  are  also  (i  +  2)-equivalcnt  to 
one  another. 

We  further  find  that  the  states  a^  and  a^  are  n-equivalent  for  all 
n  =  i,  i  +  1,  14  2,  ...  and,  consequently,  are  equivalent  states.  In 
view  of  the  arbitrariness  of  the  choice  of  the  states  a^  and  a^,  pro¬ 
position  5  is  proved. 

The  Aufenkamp-Hohn  algorithm  for  the  construction  of  the  equiva¬ 
lence  class  ("-classes )  is  based  on  the  sequential  construction  of 
i-classes  for  all  i  =  1,  2,  ...  .  Since  the  partition  into  (l  4  l)- 

-  187  - 


classes  is  a  subpartition  of  the  partition  Into  i-classes,  then  in  the 
case  of  finiteness  of  the  automaton  A,  after  a  finite  number  of  steps 
we  obtain  on  the  basis  of  theorem  5  the  sought  partition  into  “-classes. 

The  partition  into  1-classes  is  performed  directly  from  the  out¬ 
put  table  of  the  automaton:  into  the  same  1-class  there  are  combined 
all  the  states  to  which  there  correspond  identical  columns  in  the  out¬ 
put  table.  Then  there  is  constructed  the  so-called  1-table,  obtained 
as  the  result  of  replacement  in  the  automaton  switching  table  of  the 
internal  states  by  the  1-classes  which  contain  them. 

In  a  single  2-class  there  are  combined  all  the  states  belonging 
to  the  same  1-class  to  which  there  correspond  identical  columns  in  the 
1-table.  Then  we  proceed  similarly:  replacing  in  the  switching  table 
the  automaton  states  by  the  2-classes  which  contain  them,  we  obtain 
the  2-table.  From  the  2-table  we  find  the- 3-classes,  combining  in  one 
3-class  all  the  states  of  the  same  2-class  to  which  there  correspond 
identical  columns  in  the  2-table. 

Arriving  after  a  finite  number  of  steps  at  the  partition  into 
“-classes,  wo  construct  the  canonical  minimization  of  the  original 
automaton  A  directly  from  Its  switching  and  output  tables. 

As  a  result  we  have  constructed  the  minimization  algorithm  for 
any  finite  Mealy  automata.  For  the  case  of  the  Moore  automata  it  is 
necessary  to  Introduce  certain  changes  in  this  algorithm,  since  by  in¬ 
terpreting  the  Moore  automaton  as  a  Mealy  automaton  and  minimizing  it 
in  accordance  with  the  described  algorithm,  we  construct  an  automaton 
which,  although  equivalent  to  the  original,  is  possibly  not  now  a  Moore 
automaton. 

In  order  that  the  Moore  automaton  remain  a  Moore  automaton  during 
minimization  it  is  evidently  necessary  and  sufficient  that  identically 
labeled  states  of  the  automaton  not  be  related  to  different  equiva- 
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lency  classes 

This  '  .cion  can  be  satisfied  most  simply  by  introducing  for 
the  Moore  automaton  the  concept  of  O-equivalency  of  the  states  and  the 
partition  of  the  set  of  states  into  O-classes :  we  shall  term  any  Iden¬ 
tically  labeled  states  of  a  Moore  automaton  O-equivalent.  If  two  0- 
equivalent  states  are  transformed  into  two  O-equivalent  states  by  any 
input  signal,  then  they  are  termed  1-equivalent . 

All  the  further  constructions  (determination  of  the  i-classes 
for  i  >  2,  determination  of  the  equivalency  classes  and  the  construc¬ 
tion  of  the  canonical  minimization)  are  performed  just  as  in  the  case 
of  the  Mealy  automata.  Of  course,  in  the  case  of  the  Moore  automata 
for  the  construction  of  the  canonical  minimization  B  we  can  specify 
for  it  not  the  output  table,  but  the  shifted  output  table,  labeling 
the  states  of  the  states  of  the  automaton  B  (equivalence  classes)  by 
the  same  output  signals  which  are  used  to  label  the  states  of  the 
original  automaton  which  occur  in  it. 

However,  the  theory  of  minimization  of  Moore  automata  in  the 
form  just  described  is  not  fully  equivalent  to  the  corresponding  the¬ 
ory  for  the  case  of  the  Mealy  automata.  In  particular,  the  proposi¬ 
tion  analogous  to  theorem  1  does  not  extend  to  the  Moore  automata. 

To  obtain  equivalence  of  the  two  theories  it  is  necessary  to 
consider  as  the  reaction  of  the  Moore  automaton  to  the  Input  word  p 
not  that  output  word  £  which  is  obtained  as  the  result  of  the  general 
definition  of  the  automaton  given  in  §1,  but  the  word  y^q,  where  is 
the  output  signal  labeling  that  state  in  which  the  automaton  was  prior 
to  the  application  of  the  word  2*  With  this  definition  of  the  reaction 
of  the  Moore  automaton  to  the  input  word  for  this  automaton,  the  pro¬ 
positions  obtained  from  theorems  1,  2,  3*  ^  of  the  present  section  by 
replacement  of  all  Mealy  automata  encountered  in  their  formulation  by 
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Moore  automata  will  evidently  be  valid.  Theorem  5  remains,  of  course, 
valid  for  the  Moore  automata  with  the  usual  definition  of  the  output 
reactions  of  the  automata. 

With  conversion  to  the  usual  understanding  of  the  output  reac¬ 
tions  of  the  automaton,  only  those  states  into  which  the  automaton 
cannot  transfer  starting  from  the  other  states  are  found  to  be  in  a 
special  situation  (for  the  case  of  connected  automata  only  the  initial 
state  can  have  this  property).  It  is  easy  to  verify  that  for  the  re¬ 
storation  of  parallelism  in  thetheories  of  the  minimization  of  the 
Moore  and  Mealy  automata,  in  this  case  it  is  sufficient  not  to  label 
similar  states  with  any  output  signals.  This  permits  in  the  formation 
of  the  O-classes  relating  such  states  to  any  O-class  and  thereby  in¬ 
creases  the  possibilities  of  the  minimization. 

However,  the  parallelism  appearing  here  takes  place  not  with 
minimization  of  the  conventional  everywhere-determinate  automata,  but 
with  transfer  to  the  more  general  problem  of  the  minimization  of  par¬ 
tial  automata. 

The  essence  of  the  problem  of  the  minimization  of  the  partial 
automata  amounts  to  the  following:  given  the  partial  automata  (Moore 
or  Mealy)  A  which  induces  the  partial  automaton  on  mapping  cp  defined 
on  some  set  M  of  words  of  the  input  alphabet.  Required  to  construct 
the  partial  automaton  (Moore  or  Mealy  respectively)  B  which  induces 
the  partial  automaton  mapping  coinciding  on  the  set  M  with  the  mapping 
cp  and  which  has  the  smallest  number  of  Internal  states  among  all 
automata  (Moore  or  Mealy)  satisfying  this  condition. 

Since  there  is  only  a  finite  number  of  differnt  partial  automata 
in  which  the  input  and  output  alphabets  are  common  with  the  given  fi¬ 
nite  partial  automaton  A  and  in  which  the  number  of  states  does  not 
exceed  the  number  of  states  of  automaton  A,  the  formulated  problem  is 
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algorithmically  solvable.  However,  the  existing  methods  for  th(  exact 
solution  of  this  problem  are  associated  with  extensive  sorting  (see, 
for  example,  Ginsburg  [19])  and  are  therefore  unsuitable  in  practice. 

In  practice,  use  is  usually  made  of  the  following  technique  for 
the  minimization  of  the  partial  automata:  mentally  filling  In  the 
stricken  places  in  the  switching  and  output  tables  of  the  given  partial 
automaton  A,  we  combine  the  states  into  k-classes  and  minimise  using 
the  same  rules  as  in  the  case  of  conventional  (everywhere  determination 
of  states  into  classes  are  checked,  and  from  the  resulting  canonical  m 
minimizations  we  select  that  which  has  the  smallest  numbers  of  states. 

This  technique  actually  solves  the  problem  of  the  construction 
of  the  partial  automaton  B  with  smaller  number  of  states  than  the 
original  automaton  A,  and  the  partial  automaton  mapping  ir  induced  bv 
the  automaton  B  coincides  with  the  partial  automaton  mapping  cp  induced 
by  the  automaton  A  on  the  domain  of  definition  of  the  mapping  cp.  In 
this  case  the  domain  of  definition  of  the  mapping  generally  speak¬ 
ing,  Is  larger  than  the  domain  of  definition  of  the  mapping  9. 

We  shall  show  how  the  described  minimization  technique  operates, 
using  as  an  example  the  partial  Mealy  automaton  A  given  by  the  follow¬ 
ing  switching  and  output  tables  : 


12  3  4 

12  3  4 

X 

2-2  4;  * 

u  —  u  u 

y 

-3  4  4  y 

—  u  v*v 

Minimizing  the  given  automaton,  we  obtain  two  initial  possi¬ 
bilities  of  combining  into  1-classes,  leading  to  the  smallest  number 
of  classes, 

a,  =(1.3.4.).  6,  =  (2)  a nd  a,  =  (1.2).  6,  =  (3.4). 

Let  us  consider  the  first  possibility:  the  1-table  is  written 


Prom  the  1-table  we  see  that  the  class  a^  must  be  divided  into 

two  2-classes:  ag  »  (l,  3)  and  c2  -  (4);  the  third  2-class  bg  coincides 

with  the  1-class  b1  =  (2);  the  2-table  will  have  the  form 

12  3  4 
x  bt  — bt  Ci . 
y  —  QtCt  Ci 

Prom  the  2-table  we  see  that  the  3-classes  coincide  with  the  2- 
classes,  which  are,  consequently,  the  desired  ®  -classes.  In  this 
variant  the  canonical  minimization  is  represented  by  the  table 

_ Jfl*  h  Ci  _ QtbfCt 

x  \bt  -  ci ;  x\u  —  u. 
y  !f «  a»  c,  y  It;  u  v 

In  the  second  variant  of  the  minimization,  the  partition  into 
1-classes  a1=(l,  2)  and  b1=(3,  4)  leads  to  the  1-table 

12  3  4 
xia,  —  a,6, 

y  i —  bi  bibt 

and  to  the  partition  into  2-classes  ag  =  (1,  2);  b2  =  (3);  c2  =  (4),*. 
The  2-table  is  written 

12  3  4 
Jr  o,  -a,  ci, 
y  I—  V«  Cl 

which  implies  that  the  obtained  partition  into  2-classes  will  not 
break  down  further  and,  consequently,  coincides  with  the  partition  in¬ 
to  oo-classes.  The  canonical  minimization  in  this  variant  will  be  given 
by  the  tables 

_ Qj  bt  ci  _  a>  bt  c9 

x  a.  at  Ct ,  x\u  u  u  . 

y  \bt  ct  c,  y  \u  v  v 

The  canonical  minimizations  obtained  are  essentially  different: 
the  first  reacts  to  the  input  word  ^  with  the  output  word  v,  while  the 
second  reacts  with  the  output  word  u. 

§6.  STRUCTURAL  SYNTHESIS  OF  FINITE  AUTOMATA 

In  the  structural  synthesis  stage  we  select  the  elementary  autom- 
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ata  from  which  the  synthesis  of  the  structural  diagram  of  the  given 
abstract  Mealy  or  Moore  automaton  is  accomplished.  We  shall  assume 
that  the  elementary  automata  are  of  two  kinds :  elementary  automata 
with  memory,  or  memory  elements  (triggers,  delay  lines,  etc.),  and  ele¬ 
mentary  automata  without  memeory,  also  termed  logic  elements.  For 
simplicity  we  shall  limit  ourselves  to  the  case  when  we  have  available 
only  one  type  of  memory  element. 

The  input  and  output  signals  of  both  the  elementary  automata  and 
of  the  automaton  under  consideration  as  a  whole  are  designated  (coded) 
with  a  finite  sequence  of  letters  of  some  fixed  finite  alphabet,  termed 
the  structural  alphabet.  Usually,  as  the  structural  alphabet  we  choose 
the  binary  alphabet,  consisting  of  two  letters:  0  and  1.  A  second 
alphabet  which  plays  a  very  important  role  in  the  structural  synthe¬ 
sis  stage  is  the  alphabet  consisting  of  the  symbols  of  the  internal 
states  of  the  selected  memory  elements.  We  tern  it  the  state  alphabet. 
The  state  alphabet  may  not  coincide  with  the  structural  alphabet,  how¬ 
ever,  in  practice  the  binary  alphabet  is  usually  also  chosen  as  the 
state  alphabet. 

One  of  the  primary  problems  which  is  solved  in  the  process  if  the 
structural  synthesis  of  automata  is  the  writing  out  of  the  so-called 
canonical  equations  which  establish  the  relationship  of  the  signals 
applied  to  the  inputs  of  the  memory  elements  to  the  output  signals  of 
these  elements  and  to  the  signals  applied  to  the  input  of  the  entire 
automaton  as  a  wuole.  In  order  to  ensure  proper  functioning  of  the 
circuit,  we  cannot  pemit  direct  participation  of  input  signals,  ap¬ 
plied  to  the  input  of  the  memory  elements,  in  the  formation  of  the  out¬ 
put  signals  which  through  the  feedback  circuits  would  be  applied  at 
that  same  Instant  of  time  to  these  inputs.  In  other  words,  the  memory 
elements  must  be  Moore  automata  and  not  Mealy  automata.  However,  the 
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complex  automaton  formed  by  these  elements  can,  or  course,  be  either 
a  Moore  or  Mealy  automaton. 

Let  us  assume  that  the  elementary  automata  with  memory  used  In 
the  structural  synthesis  are  Moore  automata.  After  making  a  corre¬ 
sponding  shift  of  the  reference  of  the  time  intervals  for  the  output 
signals,  we  shall  consider  that  the  output  signal  at  any  instant  of 
time  t  of  every  memory  element  is  determined  by  the  internal  state  of 
th  is  element  at  that  same  instant  of  time. 

In  order  to  have  the  possibility  of  synthesizing  arbitrary  autom¬ 
ata  with  minimal  consumption  of  memory  elements,  it  is  advisable  to 
select  as  such  elements  Moore  automata  having  a  complete  system  of  tran¬ 
sitions  and  a  complete  system  of  outputs,  which  for  brevity  we  shall 
term  complete  automata.  The  completeness  of  the  system  of  transitions 
means  that  for  any  pair  of  states  of  the  automata  there  is  an  input 
signal  which  transforms  the  first  element  of  this  pair  into  the  second. 
This  requirement  is  equivalent  to  the  requirement :  in  every  column  of 
the  switching  table  there  must  be  found  all  states  of  the  given  autom¬ 
aton.  The  completeness  of  the  system  of  output  in  the  case  of  the 
Moore  automaton  means  that  to  each  state  of  the  automaton  there  is 
placed  in  correspondence  its  special  output  signal,  differing  from  the 
output  signals  of  the  other  states.  Therefore,  for  the  complete  Moore 
automata  it  is  natural  to  simply  identify  the  output  signals  with  the 
corresponding  internal  states  of  the  automaton.  We  shall  adhere  to 
this  method  in  the  future. 

Let  us  choose  as  a  memory  element  some  complete  Moore  automaton 
B.  The  internal  states  of  this  automaton  are  denoted  by  z^,  z2,  ...» 

...,  zr  (r  >  2).  According  to  the  assumed  condition  they  will  also  be 
the  output  signals.  For  the  designation  of  the  input  signals  of  the 
memory  element  we  shall  use  the  letters  s-^  Sg,  . . . ,  s  .  The  alphabet 
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(z1#  Zg,  . . . ,  zr)  is  nothing  other  than  the  state  alphabet.  We  shall 
select  the  structural  alphabet  somewhat  later,  and  for  the  moment  we 
shall  show  how  to  find  the  so-called  canonical  equation  of  the  autom¬ 
aton  whose  memory  is  composed  from  the  elements  of  the  selected  type. 

Assume  that  we  are  given  the  abstract  finite  Mealy  or  Moore  autom¬ 
aton  A  with  the  input  alphabet  X  =  (x, ,  x0,  ...,  x  ),  the  output  alpha- 
bet  Y  =  (y^,  y2,  . ..,  ym)  and  the  set  of  internal  states  A  =  (a^ 
a2,  . ..,  a  ),  specified  by  the  switching  function  fi(a,  x)  and  the  out- 
put  function  X(a,  x).  Let  the  selected  memory  element  B  be  given  by 
the  switching  function  v(z,  s).  We  pose  the  problem  of  finding  the 

t 

canonical  equations  of  the  automaton  A  under  the  condition  that  its 
memory  is  constructed  from  several  copies  B^,  B^,  ...,  B^)  of  the 
elementary  automaton  B. 

For  the  construction  of  the  automaton  A  the  number  k  of  memory 

if 

elements  must  satisfy  the  condition  r  ^  p.  In  this  case  the  various 
internal  states  a^  can  be  identified  with  the  various  sets  of  states 
of  the  automata  B^,  B^,  B^.  The  process  of  such  an  identi¬ 

fication  will  be  termed  the  coding  of  the  states  of  the  automaton  A 
in  the  chosen  state  alphabet.  Of  course,  the  coding  process  is  in  es¬ 
sence  not  unique.  However,  we  shall  not  go  into  the  details  of  the 
question  associated  with  the  selection  of  a  particular  coding  variant, 
but  shall  consider  that  this  variant  is  already  given. 

After  coding, the  states  of  the  automaton  A  will  be  designated  by 
the  k-dlmensional  vectors  (z^1),  z^2\  . ..,  z^)),  whose  components 
are  the  various  letters  z^  Zg,  ...,  of  the  state  alphabet;  the  two- 
place  output  function  X(a,  x)  of  the  automaton  A  is  converted  into  the 
(k  +  l)-place  function  X^(z^^,...  z^,  x),  which  as  before  we  shall 
term  the  output  function.  The  equivalent  of  the  switching  function 
6 (a,  x)  after  the  coding  will  be  the  system  of  k  (k  +  l)-place  func- 
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tions  9^)  (z  . . . ,  z^),  x).  ...»  q>^(z^.  ...,  z^),  x)  of  the 
transitions  In  the  elementary  memories.  The  function  cp  ^  determines 
the  state  Into  which  the  i-th  memory  element  must  transfer  at  the  in¬ 
stant  of  time  t  +  1,  if  at  the  instant  of  time  t  the  automaton  A  were 
in  the  state  (z^1^,  z^2 z^)),  and  to  its  input  there  was  applied 
the  signal  x(i  —  1,  2,  . . . ,  k;  t  =  0,  1,  2,  . . . ). 

The  next  important  step  is  the  construction  of  the  excitation 
functions  (5*  . ..,  z^,  x)  of  the  memory  elements  (i  ■  1,  2, 

. ..,  k).  The  value  of  each  such  function  with  the  selected  state 
(z^1),  . ..,  z^))  of  the  automaton  A  and  the  input  signal  x  is 

determined  as  the  input  signal  of  the  1-th  memory  element  which 

causes  the  transfer  in  the  i-th  memory  element  due  to  the  1-th  switch¬ 
ing  function,  i.e.,  the  transfer  z^1^  (z^1^,  z^,  x) 

(i  =  1,  k).  The  input  signal  can  be  selected  by  more  than  one 

method.  Therefore  the  construction  of  the  excitation  functions  of  the 
memory  element  from  their  switching  functions  is  not  unique. 

The  selection  of  the  best  method  of  construction  of  the  excita¬ 
tion  functions  is  associated  with  the  problems  of  the  following  stage 
—  the  stage  of  combination  synthesis.  However,  for  many  types  of  memory 
elements  (delay  lines,  triggers  with  complementing  input,  etc.)  the 
corresponding  transfer  is  performed  uniquely.  For  several  types  of 
elements  we  can  idicate  hybrid  combinatorial-computational  techniques 
which  permit  a  simpler,  in  comparison  with  the  general  technique  de¬ 
scribed  method  of  finding  the  excitation  functions  [23], 

The  excitation  functions  6^,  equated  to  the  input  signals  S^, 
determined  by  them,  then  give  the  sought  canonical  equations  for  the 
feedbacks  in  the  automaton  A.  However,  these  function  have  a  form 
which  is  still  not  completely  satisfactory:  the  states  of  automaton  A 
are  coded  in  the  iniversal  (for  the  given  type  of  memory  elements) 

-  196  - 


i 


state  alphabet,  which  does  not  depend  on  the  choice  of  the  automaton 
A;  for  the  designation  of  the  input  and  output  signals  use  is  made  of 
various  alphabets,  including  those  which  depend  on  the  selection  of 
the  automaton  A.  Therefore  it  is  necessary  to  fix  the  structural  alpha¬ 
bet  B,  determined  usually  by  the  coding  actually  selected  for  the  in- 
put  signals  of  the  memory  element,  and  to  code  with  the  finite  se¬ 
quences  of  the  letters  of  this  alphabet  not  only  the  input  signals 
6^')  of  the  memory  elements,  but  also  the  input  and  output  signals  x 
and  jjr  of  the  entire  automaton  as  a  whole.  This  coding  transforms  the 
system  of  excitation  functions  found  above  into  the  new  system 

of  functions 

<»</>  (*•>.. .  .?<*>«,,  «* . Uf) (i  -  1.2.  1.2 . 1) . 

where  each  of  the  functions  6^  is  the  input  signal  (letter  of  the 
structural  alphabet)  which  must  be  applied  to  the  J-th  input  channel 
of  the  i-th  memory  element  at  that  time  when  the  automaton  A  is  in  the 
state  (z^,  z^2\  z^)),  and  to  its  input  channel  there  are  ap¬ 

plied  the  signals  (letters  of  the  structural  alphabet)  u^  u^, 

•  •  •  >  ug* 

Similarly,  the  output  function  X°(z^,  z^ ,  x)  found  above 

is  replaced  by  the  system  of  functions  Xj(z^,  z^2\  z^\  u^ 

.  ..,  ug)(j  B  li  2,  •••>  h),  where  the  function  Xj  determines  the 
output  signal  (letter  of  the  structural  alphabet)  appearing  on  the 
j-th  output  channel  of  the  automaton  A  at  the  time  when  the  automaton 
A  is  in  the  state  (z^1^,  z^2\  ...,  z^),  and  to  its  input  channels 
there  are  applied  the  signa’.s  u^,  u^,  u  . 

We  term  the  resulting  function  and  Xj  respectively  the 

structural  excitation  functions  and  the  structural  output  functions  of 
the  automaton  A. 

In  the  case  (usually  encountered  in  practice)  when  both  the 

-  197  - 


structural  alphabet  and  the  state  alphabet  are  binary  alphabets,  the 
structural  excitation  functions  and  the  structural  output  functions 
can  be  considered  as  ordinary  boolean  functions.  The  problem  of  the 
following  stage  (the  stage  of  combinational  synthesis)  amounts  to  the 
actual  construction  of  the  found  functions  from  the  elementary  logic 
functions,  realized  by  the  selected  logic  elements.  The  methods  of 
solution  of  this  problem  were  discussed  in  §4  of  the  preceding  chapter. 

As  the  memory  elements  in  the  majority  of  the  modern  digital  au¬ 
tomata,  use  is  made  of  the  complete  Moore  automata  with  two  internal 
states.  It  is  interset ing  to  analyze  the  question  of  how  many  and  which 
of  the  elementary  automata  satisfy  these  properties.  Let  us  consider 
the  case  when  the  complete  Moore  automaton  with  the  two  states  0  and  1 
has  only  two  input  signals  —  x  and  £.  From  the  conditions  of  complete¬ 
ness  it  follows  that  in  each  column  of  the  switching  table  of  the  au¬ 
tomaton  there  must  be  found  both  states  -  0  and  1.  This  limitation 
leads  to  the  existence  of  4  possible  switching  talbes  in  all 


J2J 
*  o  o; 
y |i  i 


_0  1  _|0  1  |0  I 

x  o~l;  xTl;  x  l  o. 

y  10  y  |o  o  vlo  I 


After  transformation  of  the  input  signals,  the  third  table  coin¬ 
cides  with  the  first,  the  second  with  the  fourth.  Thus,  there  are  only 
two  essentially  different  automata  of  the  required  type,  given  by  the 
swithcing  tables 


10  1  JO  1 

x  O  *  (TT. 

y\l  1  and  V|1  o 

If  we  set  x  =  0,  y  =  1,  then  the  first  table  gives  the  well  known 
memory  element  termed  the  delay  (by  one  cycle)  element,  and  the  second 
gives  the  equally  familiar  element  termed  the  complementing  trigger. 

We  not e  that  the  conventional  electromagnetic  relay  with  closing  con¬ 
tacts  can  be  considered  as  a  delay  element. 
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With  an  increase  of  the  number  of  input  signals  there  appear  new 
types  of  memory  elements:  trigger  with  separate  Inputs ,  given  by  the 
switching  table 

JO  1 

x  cn 

•to  o* 

*U  i 

the  mixed  trigger,  given  by  the  table 

JO  I 
xOl 
y  o  ot 
2  1  1 
u  1  0* 

and  others. 

With  the  existence  of  only  two  internal  states  it  is  not  useful 
to  increase  the  number  of  input  signals  to  more  than  four,  since  with 
a  larger  number  of  input  signals  some  of  them  will  begin  to  duplicate 
one  another  (cause  identical  transitions  in  the  automaton).  Therefore, 
it  is  not  difficult  to  compose  a  catalog  of  all  the  complete  essen¬ 
tially  different  Moore  automata  with  two  states  (we  shall  consider  as 
essentially  different  those  automata  whose  swithclng  tables  do  not 
convert  one  into  the  other  with  redesignated  input  signals).  In  ad¬ 
dition  to  the  four  listed  types  of  automata,  the  catalog  will  contain 
three  more  automata  given  by  the  switching  tables 

JOJ  jOJ  _  |0_1 
x  0  1  ;  x  0  1  :  x  0  0. 
y  0  0  y  II  y  |1  I 
2  |1  0  2  1  0  2  1  0 

Of  course,  each  of  the  listed  types  of  automata  permits  various 
modifications  as  the  result  of  different  coding  of  the  input  signals 
in  the  binary  alphabet.  Let  us  consider  as  an  example  the  complete 
synthesis  (abstract  and  structural  to  the  determination  of  the  canon¬ 
ical  equations)  of  the  Moore  automaton  A  which  is  a  sequential  binary 
squarer.  The  automaton  A  operates  as  follows:  to  its  input  there  is 
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applied  a  two-digit  binary  whole  number,  place-by-place,  lower  places 

first.  At  the  output  of  the  automaton  the  square  of  this  number  must 

appear,  a. so  sequentially,  beginning  with  the  lower  digits.  In  other 

words,  the  automaton  A  must  realize  the  following  partial  mapping  <p: 

0000  -  0000 
1000-  1000 
0100  -  0010 
1100-  1001  . 

It  is  not  difficult  to  verify  that  the  mapping  cp,  continued  to 
the  initial  segment  of  the  words,  satisfies  the  condition  of  automat- 
icity.  Denoting  the  zero  signal  at  the  input  by  the  letter  x,  and  at 
the  output  by  the  letter  u,  and  correspondingly  the  ones  signal  on  the 
input  by  %  and  on  the  output  by  v,  we  write  this  correspondence  in  the 
form 

xxxx  —  uuuu 
yxxx  —  vuuu 
xyxx  —  uuvu 
yyxx  —  vuuv. 

In  the  resulting  correspondence  the  output  signal  u  represents 
the  event  =x\jxx\/xxx  Vxxxx\yyx\jyxx\'yxxx\jxy\/xyxx\/yy\/  tjtjx.  ,  •  and  the  out¬ 

put  signal  y_  represents  the  event  Q  =  xyxv yV  yyxx  .  Words  which  do  not 
occur  in  the  events  R  and  Q  are  forbidden.  By  use  of  the  forbidden 
words  we  can  extend  these  events  without  danger  of  impairing  in  the 
synthesized  automaton  its  reaction  to  the  specified  words. 

Since  the  events  R  and  Q  are  disjoint,  on  the  basis  of  what  has 
been  said  we  can  replace  the  event  R  by  the  complement  Q'  of  the  event 
Q.  In  this  case  the  automaton  can  synthesize  just  the  one  single  event 
Q;  the  event  Q'  is  automatically  represented  by  the  set  of  all  non- 
labeled  states  of  this  automaton.  Keeping  in  mind  that  for  the  labeling 
of  the  states  the  symbols  u  and  v  will  be  used  somehow  or  other,  we 
denote  the  event  Q  by  the  letter  v  and  its  complement  Q‘  by  the  letter 
u. 
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The  process  of  abstract  synthesis  leads  to  the  following  marking 
of  the  regular  expression  for  the  event  Q  =  v: 


V  = 

< 

( 

• 

y 

V 

*| y  Jf|v 

l2l  131  * 

y 

f 

V|*|*|) 

1  isi  161  4 

iQI 

IQl  iQ 

I 

A 

Here  the  same  basic  indices  are  assigned  to  a  pair  of  correspond¬ 
ing  places  (index  1)  and  to  a  pair  of  similar  places  (index  4).  The 
labeled  automaton  switching  table  corresponding  to  the  marking  is 
written 


\uvuuvuuu 
<023456* 
x'i  *  *  4  *  6  4  *  * 
y\l  5  3 . 

Since  the  initial  state  0  represents  only  an  empty  word,  its 
label  can  be  considered  undetermined. 

After  transformation  of  the  state  (*)  we  obtain  the  table: 


—  v  u  u  v  u  u  u 
0 12 34567 
*2774764  7* 
V 15377777 


The  set  of  states  of  the  Moore  automaton  given  by  the  last  table 
can  be  divided  into  two  O-classes:  a Q  =  (0,  2,  3>  5*  6,  7)  and  bQ  = 

=  (l,  4).  Let  us  construct  the  O-table  and  from  it  determine  the  1- 
classes : 

0  1  2  3  4  5  6  7 

_  Qp  bp  Qq  0 o  by  0 B  o 0  °0  . 

*  Qq  Qq  Qq  bo  Qo  Q 0  bg  Qg 

y  i  b0  Qo  a«  Oo  Qo  a* 

---  (0).  b\  (1.4).  Ci  =(2,5,7),  d\  *■  (3,6). 

We  construct  the  1-table  and  determine  the  2-class: 


1  2  3  4  5  6  7 

~b\  d\  b\  Ci  di  Ci 

ct  Ci  b\  Ci  di  b\  Ci 

c,  di  Ci  Ci  Ci  Ci  c,  * 


at  —  (0) .  bt~(  1.4) ,  c* 
ft  «(5).  £*  =  (7)* 


(2),  d,  =  (3,6); 


The  2-table  and  the  3-classes  will  have  the  form 
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|0  1  2  3  4  S >6  7 
Qt  b%  Ct  d%  bj-ft  d%  jt, 
~X'C,  fa  gt  bt  gt  dt  bt  gt' 
y  b%  ft  dt  gt  gt  gt  gt  gt 


Ot  ■»  (0),  bt  —  (1).  ct  «  (2). 
«.-(4),  ft  -  (5) ,  gt  -  (7) . 


dt 


(3.6) ; 


Since  the  3-classes  coincide,  as  is  easily  verifed,  with  the  4- 


classes,  they  will  also  be  "-classes,  so  that  the  automaton  can  be  con¬ 
structed  by  combination  of  states  3  and  6.  After  corresponding  renum¬ 


bering  of  the  states,  the  labeled  switching  table  of  the  desired  Moore 
automaton  A  is  written 


—  v  U  V  u  u  u 

1234567 

X 

3777647 

y 

2  5  6  7  7  7  7 

Going  to  the  structural  synthesis,  we  choose  as  the  memory  element 
delay  line  with  the  switching  table 


*0 

1 


01 

00. 

11 


Since  the  synthesized  automaton  has  7  states,  while  the  memory 
element  has  2  states,  we  must  select  3  memory  elements  (2^  ^7).  Let  u 
us  denote  their  internal  states  by  the  letters  z1,  z2,  z ^  and  the  in¬ 
put  signals  by  s^,  s2,  (all  these  quantities  can  take  the  values  0 
and  l).  We  denote  the  states  of  the  automaton  A  by  the  values  of  the 
vector  (z1 ,  z2*  Let  us  take  the  so-called  nautral  system  of  coding 

of  the  states,  in  which  each  state  is  coded  by  writing  its  number  in 
the  binary  notation  system 

1-001;  2=010;  3=011;  4=100;  5=101;  6=110;  7=111. 

Let  us  denote  the  physical  input  signal  of  the  entire  automaton  A 
by  the  letter  o,  the  physical  output  signal  by  d,  and  let  us  rewrite 
the  labeled  switching  table  of  this  automaton  in  accordance  with  the 
chosen  coding  system  (such  a  table  is  usually  called  the  physical 
switching  table  of  the  automaton  in  contrast  to  the  previously  consid¬ 
ered  abstract  switching  table  in  which  no  account  was  taken  of  the  pe- 
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culiarities  of  the  coding). 
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The  resulting  physical  switching  table  of  the  automaton  gives  the 
states  zJ^  =  zj  (t  +  l),  z =  z2(t  +  l),  z^  =  z^  (t  +  l)  of  the  memory 
elements  at  each  succeeding  moment  of  time  t  +  1  as  a  boolean  function 
of  their  states  z^  =  z^  (t),  z^  =  z^(t) ,  =  z^  fa)  and  the  input 

signal  c  =  c(t)  of  the  automaton  A  at  the  present  moment  of  time  t. 
from  the  switching  table  of  the  delay  element  we  see  that  its  state  at 
any  succeeding  moment  of  time  t  +  1  coincides  with  the  signal  am  its 
input  at  the  present  moment  of  time.  Therefore  we  can  consider  that  the 
written-out  table  gives  the  structural  excitation  functions  of  the 
sought  automaton  A 

s/=a,  (2,.  2i,  2s,  c)  0  =  1.  2.  3). 

We  write  out  the  tables  of  the  values  which  determine  respectively 
the  functions  s^,  s2,  immediately  in  the  form  of  the  Karnaugh  map 
(see  §4,  Chapter  2) 

Using  the  methods  developed  in  the  preceding  chapter,  wc  find  the 
minimal  disjunctive  normal  forms  for  the  excitation  functions  (input 
signals  of  the  memory  elements)  of  the  automaton  A 

Si  =  z,  V  *2 ;  s2  =  2S  V  h  V  ~z\c  V  z, c ;  =  2,c  V  z,z2zt  V 

V  Z3Z3  V  cz,Vz,c. 

We  also  determine  the  structural  output  function  (which  determines 
the  output  signal  d  of  the  automaton  A  as  a  function  of  the  states  of 
the  elements  of  its  memory)  directly  from  the  labeled  physical  switoh- 
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ing  table  of  the  automaton  A.  The  Karnaugh  map  for  this  function  is 
written 
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From  the  Karnaugh  map  we  easily  find  the  minimal  disjunctive  form, 
which  gives  the  required  output  function  d, 

d  -  iih  v 

We  note  that  all  our  functions  were  found  to  be  determinate,  not 
on  all  the  sets,  and  we  have  extended  their  definitions  in  order  to  ob¬ 
tain  representations  for  them  which  are  as  simple  as  possible. 

Let  us  introduce  as  logic  elements  invertors,  and  also  two- input 
AND  and  OR  elements.  Denoting  them  by  circles  with  symbols  of  the  oper¬ 
ations  corresponding  to  them”"! ,  A,  V,  and  denoting  the  memory  (delay) 
elements  by  squares  with  the  letters  z^,  z^  and  z^  inside,  we  obtain 
A  which  is  shown  in  Fig.  9*  This  circuit  includes,  in  addition  to  three 
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delay  elements,  4  Invertors,  5  AND  elements  and  7  OR  elements  (16  lopjc 
elements  and  3  delay  elements  in  all). 
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Chapter  4 

SELF-ORGANIZING  SYSTEMS 

§1.  CONCEPT  OF  S ELF- AI/TERAT ION  AND  SELF-ORGANIZATION  IN  AUTOMATA 

The  concept  of  the  algorithm  and  of  the  dicrete  automaton  is 
generally  associated  with  the  idea  of  their  invariability  with  respect 
to  time.  Their  corresponding  alphabetic  representations  are  pictured 
as  something  rigid,  specified  once  and  for  all.  With  relation  to  the 
classical  concept  of  the  algorithm  and  to  the  usual  understanding  of 
the  method  of  functioning  of  the  discrete  automaton  this  idea  Is  to  a 
certain  degree  Justified.  At  the  same  time  everyone  knows  that  the 
most  advanced  of  self-organizing  systems  are  being  simulated  at  the 
present  time  on  the  general  purpose  electronic  computers,  which  are 
nothing  other  than  discrete  automata  with  a  "rigid"  structure,  with 
the  aid  of  programs,  which  are  in  essence  algorithms  written  in  some 
special  form. 

The  contradiction  arising  here  is  to  a  considerable  degree  only 
apparent.  The  truth  is  that  the  difference  between  the  "rigid"  and 
the  "self-altering"  information  converters  is  quite  arbitrary  in  the 
majority  of  the  cases  and  in  determined  not  so  much  by  the  design  of 
the  converter  itself,  as  by  the  organization  of  the  experiment  using 
the  converter.  The  same  information  converter  can  in  some  conditions 
be  considered  to  be  rigid,  unchanging,  while  under  other  conditions  it 
can  be  considered  as  self-altering  and  self-organizing. 

To  make  these  statements  more  precise,  let  us  consider  any  dis¬ 
crete  automaton  A  with  the  input  alphabet  X  the  output  alphabet  Y 

*  •  ^ 
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the  set  of  internal  states  A  and  the  initial  state  aQ.  The  usual  im¬ 
pression  of  the  nature  of  the  functioning  of  the  automaton  implicitly 
presumes  that  after  the  application  to  its  input  of  some  word  £  in  the 
alphabet  X  and  obtaining  the  corresponding  output  word  £  -  £(p)  in  the 

Y  alphabet  the  automaton  again  returns  to  the  initial  state.  Thus,  at 

• 

the  moment  of  the  beginning  of  the  application  of  each  new  input  word 
the  automaton  is  always  in  the  same  state  a^.  As  a  result  of  this  the 
mapping  induced  by  the  automaton  £  is  rigid,  unalterable  in  the  sense 
that  the  result  of  the  conversion  of  any  input  word  2  by  the  mapping 
£  depends  only  on  the  word  jd  itself  and  not  on  the  moment  of  time  at 
which  it  was  applied  to  the  automaton  input. 

We  will  term  each  of  the  possible  input  words  of  the  automaton  a 
question  and  the  corresponding  output  word  a  response.  In  this  case 
"rigidity"  of  the  automaton  amounts  to  the  fact  that  to  a  particular 
question  it  always  and  under  all  conditions  gives  the  same  response. 
The  automaton  is  thereby  deprived  of  any  capability  for  learning  and 
improvement  of  its  responses. 

However,  the  transition  of  the  automaton  Into  the  initial  state 
described  above  prior  to  each  new  question  Is  not  at  all  mandatory. 
Moreover,  it  is  not  specified  directly  in  the  definition  of  the  func¬ 
tioning  of  the  automaton  which  was  given  in  §6  of  Chapter  3-  It  is 
natural  to  define  the  functioning  of  the  automaton  so  that  the  begin¬ 
ning  of  each  succeeding  question  finds  the  automaton  In  the  state  in 
which  it  was  after  the  termination  of  the  answer  to  the  preceding 
question.  With  this  definition,  the  automaton  which  we  previously  con¬ 
sidered  to  be  rigid,  unalterable  will,  generally  speaking,  change  its 
responses  in  the  course  of  time  and  can,  in  particular,  be  self-learn¬ 
ing,  self- improving,  etc. 

Let  us  now  define  more  precisely  the  method  of  functioning  of 
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the  discrete  automaton  with  application  to  the  theory  of  self-organiz¬ 
ing  systems.  Extremely  important  concepts  defining  the  method  of  func¬ 
tioning  of  the  automaton  are  the  concepts  of  the  cycle  and  informa¬ 
tion  cycling. 

Assume  that  to  the  automaton  input  there  is  applied  some  (finite 

or  infinite)  sequence  of  letters:  x,  x.  . . .  x,  .  To  it  there  corre- 

X1  x2  xn 

sponds  some  sequence  of  letters  y,  y,  . ..  y.  at  the  automaton  output. 

3 1  J2  3n 

Let  us  assume  that  we  have  identified  some  increasing  sequence  k,  1, 

1,  ...  of  moments  of  discrete  time  (l  <  k  <  l  <  ...).  Then  each  pair 

of  words  (x,  s,  ...  x,  ,  y.  y  ...  y,  ),  (x,  x,  .  ..x,  y, 

X1  x2  \  J1  J2  Jk  k+1  xk+2  V  Jk+1 

yJk+2  •••  w111  be  tenned  a  cycle,  and  the  operation  itself 

of  the  identification  of  the  cycles  will  be  termed  the  information 
(input  and  output)  cycling  operation  for  the  automaton  in  question. 

In  the  future  we  will  assume  that  for  each  automaton  under  con¬ 
sideration  there  is  identified  a  particular  class  of  admissible  se¬ 
quences  of  Input  letters  and  that  each  such  sequence  (together  with 
the  corresponding  sequence  of  output  letters)  is  partitioned  into 
cycles.  The  cycling  operation  is  thus  defined,  generally  speaking,  not 
on  some  one  pair  of  sequences,  but  on  all  pairs  admissible  sequences. 

In  the  abstract  approach  to  the  concept  of  the  cycle  and  cycling 
there  is  no  additional  meaining  involved  other  than  what  has  already 
been  defined.  However,  in  practice  cycling  always  presumes  that  the 
pair  of  words  (input  and  output)  composing  each  such  abstract  cycle  is 
in  some  sense  a  complete  real  cycle  of  functioning  of  the  automaton, 
which  can  be  considered  separately  from  the  remaining  cycles.  There 
are  two  cases  which  are  encoantered  most  frequently:  the  case  when 
the  first  word  of  the  pair  (input  word)  is  a  question  posed  to  the 
automaton,  and  the  second  word  of  the  pair  is  the  response  to  this 
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question,  and  the  case  when  the  second  word  of  the  pair  is,  as  before, 
the  response,  but  the  first,  In  addition  to  the  question,  Includes  in 
itself  an  evaluation  of  the  given  response  as  well.  Of  course,  in  both 
cases  it  is  assumed  that  empty  letters  which  may  occur  in  either  the 
first  or  second  words  need  not  be  taken  into  consideration. 

The  situation  which  arises  in  these  two  cases  is  shown  in  Figs. 

10  and  11  respectively. 

We  note  that  in  the  general  case  it  is 
frequently  advisable  in  the  design  of  automata 
to  provide  for  partial  (and  sometimes  even 
complete)  overlapping  of  the  response  and  ques¬ 
tion  (begin  the  response  before  the  question 
is  terminated).  This  situation  is  reflected  in  Figs.  10  and  11.  In 
performing  the  cycling  operation  the  boundaries  of  the  cycles  are  a? so 

determined  basically  by  two  methods.  We  can, 
first,  simply  fix  some  natural  number  k  and 
require  that  the  input  ''M  output  word  in  each 
cycle  contain  exactly  k  letters  (including 
empty  letters  as  well),  We  will  term  this  k- 
cycling.  Second,  we  can  define  the  boundaries  of  the  cycles  by  fixing 
for  this  purpose  a  special  letter  or  word,  termed  a  label.  For  the 
separation  of  the  cycles  it  is  most  convenient  to  place  such  a  label 
at  the  beginning  of  each  successive  question  (here  we  will  consider 
that  the  label  is  a  part  of  the  question).  We  will  agree  to  call  this 
method  of  cycling  label  cycling.  It  is  obvious  that  the  combination 
of  letters  fixed  as  labels  must  be  used  exclusively  for  this  purpose. 

A  label  can  also  be  used  within  a  cycle  (for  example,  for  indicating 
the  beginning  of  an  evaluation),  but  this  label  must  be  different  from 
the  label  which  Indicates  the  boundary  of  the  cycle.  In  the  design  of 
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Fig.  11.  1)  cycle; 
2)  question;  3)  re¬ 
sponse;  4)  valua¬ 
tion. 
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Fig.  10.  1)  cycle; 
2)  question;  3)  re¬ 
sponse. 


the  automaton  it  is  frequently  convenient  to  provide  for  the  automaton 
to  put  out  a  special  label  at  the  end  of  each  response.  We  will  con¬ 
sider  that  in  the  operation  of  the  automaton  there  are  encountered 
only  admissible  sequences  of  input  signals,  and  that  for  each  such  se¬ 
quence  the  corresponding  partition  into  cycles  has  been  performed. 

The  ordered  sequence  of  cycles  preceding  the  given  cycle  C  in  a 
particular  fixed  admissible  sequence  of  the  automaton  A  is  tenned  the 
learning  history  of  the  automaton  A  for  the  cycle  C. 

It  is  natural  to  texm  an  automaton  self- improving  or  self-learn¬ 
ing  if  in  the  course  of  the  lengthening  of  the  learning  history  it  im¬ 
proves  its  responses.  This  definition,  of  course,  in  no  way  lays  pre¬ 
tense  to  exactness  and  must  be  considered  to  be  preliminary.  The  de¬ 
finitions  for  the  concept  of  the  Improvement  (self-learning)  of  the 
automata  will  be  made  more  precise  in  one  of  the  following  sections 
after  a  preliminary  consideration  of  the  probability-theoretic  concepts 
which  are  necessary  to  such  definitions.  However,  it  is  useful  to  men¬ 
tion  here  the  directions  of  this  further  definition.  First  of  all  it 
is  necessary  to  refine  the  concept  of  the  quality  of  the  response  with 
the  aid  of  the  introduction  of  some  numerical  evaluation  of  the  re¬ 
sponse.  Under  this  condition  we  can  put  exact  meaning  into  the  concept 
of  improvement  of  the  quality  of  the  response  which  was  used  above  in 
the  definition  of  the  self-improvement  of  the  automata. 

Further,  we  must  keep  in  mind  that  even  the  automata  with  the 
most  clearly  marked  tendency  to  self-improvement  do  not  necessarily 
Improve  their  responses  absolutely  to  all  questions.  Here  we  must  con¬ 
sider  the  improvement  of  the  quality  of  the  resposnes  on  the  average. 
The  same  is  true  of  the  learning  history.  Some  relatively  in  frequent¬ 
ly  encountered  learning  hirtories  can  obviously  lead  to  deterioration 
of  the  average  quality  of  the  responses,  however  If  the  remaining 
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learning  history  leads  to  a  sharp  improvement  of  the  response  quality 
the  automaton  as  a  whole  can  be  considered  self- improving. 

Finally,  it  is  obvious  that  we  must  differentiate  self-improve¬ 
ment  which  is  prespecified  ahead  of  time  by  the  automaton  designer 
(regardless  of  the  form  of  the  learning  history)  and  the  really  self- 
triggered  self-improvement  which  is  determined  by  the  learning  history 
which  actually  takes  place  and  which  therefore  is  not  planned  ahead  of 
time.  It  is  clear  that  only  the  second  type  of  self-improvement  is 
actually  deserving  of  this  name.  As  for  the  first  type,  in  this  case 
the  designer  actually  places  the  correct  responses  in  the  automaton 
ahead  of  time,  but  in  order  to  simulate  the  process  of  self-improve¬ 
ment  he  forces  the  automaton  keep  this  information  under  judgement  for 
a  certain  time.  As  a  result  the  automaton  at  first  gives  responses  of 
poor  quality  and  only  at  the  end  of  some  period  (some  number  of  cycles) 
does  it  begin,  using  the  information  which  has  been  stored  in  it,  to 
give  correct  answers.  However,  we  can  hardly  term  this  sort  of  improve¬ 
ment  of  the  quality  of  the  automaton  responses  with  time  self-improve¬ 
ment. 

All  that  we  have  said  gives  an  idea  of  the  difficulties  which 
must  be  overcome  in  the  exact  definition  of  the  concept  of  self- Im¬ 
provement.  In  a  similar  situation  is  the  concept  of  self-organlzat ion, 
which  it  seems  to  us  is  somewhat  more  general  than  the  concept  of 
self-improvement.  With  self-improvement  there  must  of  necessity  be  im¬ 
provement  of  the  quality  of  the  responses.  With  self-organization  the 
quality  of  the  responses  may  not  be  determined  at  all.  It  Is  only  nec¬ 
essary  that  in  the  course  of  learning  the  automaton  on  the  average  in¬ 
creases  the  definiteness  of  these  responses.  The  corresponding  refine¬ 
ment  of  the  definition  will  be  given  after  the  introduction  of  the  nec¬ 
essary  probability-theoretic  concepts. 
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In  the  refinement  of  the  concepts  of  self-organization  and  self- 
improvement  it  is  convenient  to  make  use  of  the  so-called  cyclic  reduc¬ 
tion  of  the  automata.  Cyclic  reduction  is  defined  for  the  automata  in 
which  the  set  of  admissible  input  sequences  is  fixed  and  the  cycling 
of  the  input  and  output  infoimation  has  been  performed.  With  satisfac¬ 
tion  of  these  conditions,  for  any  automaton  A  the  input  and  output 
alphabets  can  be  replaced  as  follows :  the  letters  of  the  new  input 
alphabet  X1  are  considered  to  be  all  the  different  input  words  of  all 
cycles  in  all  the  admissible  sequences,  the  letters  of  the  new  output 
alphabet  Y1  are  similarly  considered  to  be  all  the  different  output 
words  of  the  indicated  cycles. 

For  any  state  a  of  the  automaton  A  and  any  letter  x’  of  the  alpha¬ 
bet  X’  (input  word  of  sane  cyc±e),  we  use  6’ (a,  x1)  to  denote  the  state 
into  which  the  automaton  transitions  from  the  state  a  under  the  action 
of  the  input  word  x*.  We  use  X’(a,  x')  to  denote  the  output  word  de¬ 
livered  by  the  automaton  A  under  the  action  of  the  input  word  x'  in 
the  case  when  the  state  a  is  taken  as  the  initial  state.  Any  admissi¬ 
ble  input  sequence  of  the  automaton  A  can  be  considered  ad  the  sequence 
x,(l)x,(2)  ...  of  letters  of  the  new  input  alphabet  X1.  Let  us  cons id- 
er  the  set  A’  of  all  those  states  of  the  automaton  A  into  which  it  can 
be  switched  from  the  initial  state  aQ  by  the  input  words  of  the  form 
x’(l)  x'(2)  ...  x!(k)  (k  2  °)>  l»e. ,  by  all  possible  initial  segments 
of  the  various  admissible  input  sequences.  The  initial  state  aQ  itself 
of  necessity  occurs  in  this  set. 

Now  it  is  not  difficult  to  construct  the  automaton  A’  in  which 

the  set  of  internal  states  is  the  set  A’,  the  input  alphabet  coincides 

with  the  set  X’  and  the  output  alphabet  coincides  with  the  set  Y'.  The 

•  • 

switching  and  output  functions  of  this  automaton  will  be  the  functions 
6*  and  X’  defined  above,  and  the  initial  state  will  be  the  state  aQ. 
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It  is  assumed  that  only  admissible  input  sequences  (rewritten  in  the 
alphabet  X’)  will  be  applied  to  the  input  of  the  constructed  autom- 
aton.  In  this  case,  as  is  easily  verified,  the  definition  of  the  au¬ 
tomaton  A  is  completely  correct:  there  are  sufficient  states  and  out¬ 
put  letters  to  completely  describe  the  functioning  of  the  automaton 
under  the  influence  of  any  admissible  input  sequence. 

We  agree  to  term  the  automaton  A*  thus  constructed  the  cyclic  re¬ 
duction  of  the  origina?  automaton  A.  Obviously  the  information  cycl¬ 
ing  will  be  a  1-cycling  in  the  automaton  A’.  In  other  words,  in  the 
cyclic  reduction  of  any  automaton  both  the  questions  and  the  responses 
are  single-lettered. 

With  cyclic  reduction  of  automata  the  number  of  their  internal 
states  can  only  diminish  or,  at  the  least,  remain  unchanged.  The  num¬ 
ber  of  letters  of  the  Input  and  output  alphabets  will,  generally  speak¬ 
ing,  increase.  It  is  clear  that  with  k-cycling  of  the  original  infor¬ 
mation  cyclic  reduction  cannot  cause  a  transition  from  finite  alphabets 
to  infinite.  However,  in  the  case  of  label  cycling  such  a  transition 
is  completely  possible  —  after  cyclic  reduction  a  finite  input  or 
output  alphabet  may  be  transformed  into  an  infinite  one. 

Let  us  consider  as  an  example  the  cyclic  reduction  of  the  autom¬ 
aton  A  with  the  three  states  1,  2,  3*  the  two  Input  letters  x,  ^  and 
the  two  output  letters  u,  v  whose  switching  and  output  functions  are 
given  by  the  respective  tables 

J1  2  3  lUJ 

x  '2  2  2  •  x  u  v  w 
y\ 3  2  1  y\v  u  v 

Assuming  all  the  input  sequences  admissible  and  taking  the  state 
1  as  the  initial  state,  as  a  result  of  the  cyclic  reduction  we  arrive 
at  the  automaton  A*  with  two  states,  four  input  letters  x1  =  xx,  = 

=  xy,  x^  =  yx,  x^  =  yy,  four  output  letters  v>  ~  v3  -  uv,  i>,  =  vu,  vt=vv. 
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The  switching  and  output  functions  of  this  automaton  are  given  by 
the  respective  tables 


11  2 

1  2 

Xi\2  2 

v%  Vi 

x,  2  2  '• 

Vi  V, 

*.|2  2 

X, 

0>  Vi 

*4|1  2 

*4 

Vi  vt 

With  the  use  of  the  cyclic  reduction  a  very  graphic  solution  is 
found  of  the  question  of  whether  the  automaton  being  considered  is 
rigid  or  self-altering  with  respect  to  the  given  cyclization.  Actually, 
we  formulate  the  following  proposition. 

In  order  that  the  discrete  automaton  A  with  given  cyclization  be 
rigid  (i.e.,  it  does  not  alter  its  responses  to  the  same  question  in 
the  course  of  time)  it  is  necessary  and  sufficient  that  after  cyclic 
reduction  the  output  function  X’(a,  x)  not  depend  on  the  states  of  the 
reduced  automaton  AT. 

Independence  of  the  output  function  on  the  states  of  the  autom¬ 
aton  means,  obviously,  equivalence  between  all  the  elements  of  each 
row  of  the  output  table  of  the  automaton  (elements  standing  In  dif¬ 
ferent  rows  can,  of  course,  be  different). 

With  application  to  the  example  considered  above,  the  proposition 
Just  formulated  immediately  discloses  the  self-variability  of  the  au¬ 
tomaton  A  for  the  case  of  2-cycling  of  its  input  information. 

It  is  easy  to  see  that  the  automaton  In  which  the  output  function 
does  not  depend  on  the  states  can  be  replaced  by  its  equivalent  (i.e., 
inducing  the  same  alphabetic  mapping)  automaton  having  one  single  In¬ 
ternal  state.  The  automaton  with  a  single  internal  state  is  in  essence 
an  automaton  without  memory.  In  the  abstract  theory  of  automata  it  is 
shown  that  every  discrete  automaton  A  can  be  minimized,  i.e.,  in  other 
words,  can  be  replaced  by  its  equivalent  automaton  B  (absolute  minimi¬ 
zation  of  tho  automaton  A)  having  the  smallest  number  of  states  among 
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all  the  automata  which  induce  the  same  alphabetic  mapping  as  does  au¬ 
tomaton  A. 

If  after  cyclic  reduction  of  any  discrete  automaton  A  (with  given 
cyclization  of  the  information)  we  then  perform  an  absolute  minimisa¬ 
tion,  we  obtain  an  operation  which  we  shall  term  complete  cyclic  re¬ 
duction  of  the  considered  automaton  A  (for  the  given  cyclization).  The 
validity  of  the  following  proposition  resutls  from  the  above  consid¬ 
erations. 

In  order  that  the  discrete  automaton  A  with  given  information 
cycling  have  the  property  of  time  independence  of  its  elements,  it  is 
necessary  and  sufficient  that  as  the  result  of  complete  cyclic  reduc¬ 
tion  of  the  automaton  A  we  obtain  an  automaton  without  memory. 

The  converse  is  also  true:  the  existence  of  a  nontrivial  memory 
in  the  automaton  obtained  as  the  result  of  complete  cyclic  reduction 
of  the  considered  automaton  A  means  that  the  automaton  A  is  (relative 
to  the  given  cyclization)  self-adaptive. 

The  relativity  of  the  property  of  self-adaptability  of  automata 
(its  dependence  on  the  method  of  cycling  the  input  information)  is 
easily  illustrated  by  the  example  of  the  automaton  C  with  two  states, 
given  by  the  switching  and  output  tables 

_|!i.  J1  2 

x  2  1'  x ju  v' 
y  1 1  2  y\v  u 

With  any  admissible  input  sequences  in  the  case  of  2-cycling  of 
the  input  information,  the  result  of  cyclic  reduction  of  the  automaton 
C  will  be  an  automaton  with  a  single  state.  Thus,  even  without  minimi¬ 
zation  we  obtain  an  automaton  without  memory  and,  in  view  of  the  cri¬ 
terion  formulated  above,  we  arrive  at  the  conclusion  on  the  rigidity, 
invariability  of  the  automaton  C.  At  the  same  time,  with  1-cycling  of 
the  input  information  the  automaton  C  must  be,  obviously,  considered 
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to  be  self-adaptive.  In  practical  problems  we  can  quite  easily  dif¬ 
ferentiate  the  rigid  and  self-adaptive  automata  simply  because  the 
cyclization  of  the  input  information  is  prespecified. 

We  note  that  in  the  criteria  considered  and  in  the  examples  dis¬ 
cussed  on  the  basis  of  these  criteria  we  spoke  not  of  self  organiza¬ 
tion  or  self-improvement,  but  only  of  self-adaptivity  of  the  automata. 
The  analysis  of  examples  of  self-organization  and  self-improvement  will 
be  made  in  the  later  sections  after  creation  of  the  corresponding 
mathematical  basis  and  the  introduction  of  precise  definitions. 

In  the  remainder  of  the  present  section  we  shall  consider  one 
terminological  question.  That  is  the  usage  of  the  terms  "system"  or 
"automaton"  in  combination  with  the  concepts  of  self-auaptation,  self¬ 
organization,  3 elf- improvement  and  self-learning.  As  we  see  from  the 
discussions  already  presented,  all  these  concepts  can  be  developed  for 
the  discrete  automata.  However,  with  this  approach  to  the  matter  we 
essentially  lose  the  possibility  of  penetrating  into  the  structure  of 
the  corresponding  process  (self-adaptation,  self-organization,  etc.). 

The  study  of  the  structure  of  the  self-adaptation  and  self-organ¬ 
ization  processes  is  facilitated  with  the  representation  of  such  pro¬ 
cesses  not  in  the  form  of  individual  automata  (algorithms),  but  in  the 
form  of  systems  of  automata  (algorithms).  In  the  simplest  case  such  a 
system  consists  of  two  automata  (algorithms).  The  first  of  these,  t 
termed  the  operational  automaton  (algorithm),  directly  processes  the 
information  applied  to  the  system  input.  The  second  automaton  (algo¬ 
rithm),  termed  the  controlling  or  learning  automaton  (algorithm), 
evaluates  the  results  of  the  functioning  of  the  operational  automaton 
(algorithm)  and  introduces  into  it  the  suitable  changes  (in  the  case 
when  we  are  considering  automata,  these  changes  are  introduced  into 
the  switching  and  output  functions  of  the  operational  automaton). 
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Over  the  controlling  automaton  (algorithm)  of  the  first  stage 
there  can  be  placed  the  controlling  automaton  (algorithm)  of  the  sec¬ 
ond  stage,  whose  function  is  to  evaluate  the  operation  of  the  automaton 
(algorithm)  of  the  first  stage  and  introduce  into  it  the  required 
changes.  By  analogy  we  can  introduce  controlling  automata  (algorithms ) 
for  the  third,  fourth,  and  any  higher  stages.  We  shall  term  the  hier¬ 
archy  of  automata  (algorithms)  which  arise  in  this  fashion  systems  and 
shall  develop  the  concepts  of  self-adaptation,  self-organization  and 
self-improvement  for  them. 

Of  course,  in  the  abstract  sense,  any,  no  matter  how  complex, 
system  of  automata  is  equivalent  to  a  single  automaton,  however  such 
reduction  of  the  systems  to  individual  automata  leads  to  loss  of  the 
possibility  of  study  of  certain  properties  of  such  systems  which  arc 
of  practical  interest,  primarily  the  laws  for  the  circulation  of  in¬ 
formation  within  the  system  itself.  Therefore,  in  the  future  we  shall 
deal  not  only  with  automata  (algorithms)  considered  abstractly  but  al¬ 
so  with  systems  of  automata  (algorithms)  for  whose  study  the  internal 
structure  is  of  particular  interest,  i.e.,  the  relations  between  the 
individual  automata  (algorithms)  composing  the  system. 

§2.  SOME  AUXILIARY  INFORMATION.  FROM  PROBABILITY  THEORY 

In  the  present  section  we  will  present  certain  information  from 
probability  theory  which  is  needed  for  out  further  constructions.  In 
view  of  the  fact  that  this  presentation  is  of  an  auxiliary  nature, 
proofs  of  most  of  the  propositions  formulated  will  not  be  included.  If 
needed,  the  reader  can  find  the  corresponding  proofs  in  the  monographs 
of  Feller  [8l]  and  Kramer  [48].  It  is  assumed  that  the  reader  is 
familiar  with  such  elementary  concepts  of  the  theory  of  probability  as 
the  concept  of  the  event,  event  probility,  etc. 

The  concept  of  the  random  quantity  is  of  very  essential  impor- 
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tance  in  the  construction  of  the  theory  of  the  self-organizing  systems. 
We  shall  limit  ourselves  to  the  consideration  of  only  the  random  quan¬ 
tities  which  take  on  real  values.  Here,  in  addition  to  the  conventional 
so-called  univariate  random  quantities,  we  consider  also  the  multi¬ 
variate  random  quantities  whose  values  will  be  the  finite  ordered  en¬ 
sembles  of  real  numbers  or,  what  is  the  same,  the  real  vectors  of  a 
particular  (finite)  dimension. 

It  is  also  Important  to  differentiate  continuous  and  discrete 
random  quantities.  The  continuous  random  quantity  can  take  any  values 
in  a  particular  region  (open  set)  of  the  corresponding  vector  space, 
for  example  on  some  interval  of  the  real  axis  (including  the  entire 
axis  as  well)  In  the  case  of  the  univariate  random  quantities.  How¬ 
ever,  the  totality  of  the  possible  values  of  the  discrete  random  quan¬ 
tity  can  be  only  the  discrete  sets  of  points,  i.e.,  those  sets,  each 
point  of  which  can  be  Inclosed  in  a  sphere  (possibly  of  very  small 
radius)  which  does  not  contain  other  points  of  the  same  space.  An  ex¬ 
ample  of  the  discrete  set  might  be  the  set  of  all  points  of  some  Eucli¬ 
dean  space  which  have  Integral  coordinates. 

The  property  of  randomness  of  the  quantities  we  have  considered 
manifests  Itself  in  the  so-called  trials.  In  each  trial  the  considered 
random  quantity  takes  a  particular  value  from  the  domain  of  Its  defi¬ 
nition.  The  probability  that  the  random  quantity  will  take  a  particular 
value  is  determined  by  the  distribution  law  of  this  random  quantity. 

The  distribution  law  of  the  discrete  random  quantity  x  (univariate  or 
multivariate)  Is  specified  with  the  aid  of  the  real  function  f(x)  de¬ 
fined  for  all  values  which  the  given  random  quantity  can  take  so  that 
for  any  value  x^  the  magnitude  of  f(x^)  is  equal  to  the  probability 
that  the  random  quantity  x  will  take  the  value  x^  in  the  given  trial. 

The  domain  of  definition  of  the  discrete  random  quantities  which 
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we  are  considering  can  be  either  finite  or  denumerably  infinite.  It  is 
evident  that  in  both  cases  the  normalization  condition 


Z/W=1.  (47) 

/ 

is  satisfied,  where  the  summation  is  assumed  to  extend  ever  the  entire 
region  of  definition  of  the  given  random  quantity. 

When  the  random  quantity  x  is  continuous,  its  distribution  law  is 
given  with  the  aid  of  the  so-called  probability  density  function  cp(x). 
This  function  is  presumed  defined  in  the  region  M  of  definition  of  the 
considered  random  quantity  x  and  integrable  in  this  region.  With  each 
successive  trial  the  probability  p(N)  that  the  random  quantity  x  will 
take  a  value  from  some  subregion  N  of  its  region  of  definition  is  equal 
to  the  integral  of  the  probability  density  taken  with  respect  to  this 
subregion: 

P(N)  =*  ^y(x)dx.  (^) 

Whence  follows  directly  the  satisfaction  of  the  normalization  con¬ 
dition 


f<P(*)dx  =.  1.  (49) 

M 

Two  random  quantities  x  and  ^  are  termed  mutually  independent  if 
when  the  quantity  x  takes  a  particular  value  there  is  no  change  of  the 
distribution  law  of  the  quantity  ^  and  vice  versa.  Similarly  the  inde¬ 
pendence  of  any  set  of  random  quantities  implies  that  when  all  the 
quantities  occurring  in  this  set,  other  than  the  quantity  x,  take  any 
values  there  is  no  change  of  the  distribution  law  of  this  latter  quan¬ 
tity  with  any  choice  of  the  quantity  x  from  the  indicated  set. 

Trials  performed  with  a  particular  random  quantity  x  are  termed 
independent  trials  if  the  distribution  law  of  the  quantity  x  remains 


unchanged  in  each  trial  and,  consequently,  does  not  depend  on  the  val- 


ues  which  the  quantity  x  took  in  the  previous  trials. 

The  domain  of  definition  of  the  continuous  random  quantity  can 
always,  if  need  be,  be  extended  over  the  entire  space,  assuming  that 
everywhere  except  in  the  original  domain  of  definition  the  probability 
density  is  equal  to  zero. 

We  can  also  approximate  the  continuous  distribution  laws  with  the 
discrete  laws  and  vice  versa.  In  the  first  case  it  is  sufficient  to 
partition  the  domain  of  definition  M  of  the  corresponding  continuous 
random  function  x  into  a  finite  number  of  sufficiently  small  (not  only 
in  volume,  but  also  in  diameter)  subdomains  M,,  M2,  . . .  ,  Mn,  select 
within  each  such  subdomain  a  point  x^  and  introduce  the  discrete 
distribution  law  f(x)  on  the  selected  points,  setting  f(x^)  =  ^fcp(x)  dx 

(i  =  1,  2,  ...,  n),  where  <p(x)  is  the  probability  density  of  the  orig¬ 
inal  continuous  random  quantity.  In  this  case  the  probabilities  pre¬ 
viously  associated  with  the  corresponding  subdomains  are  concentrated 
In  the  individual  points. 

With  the  reverse  transition  from  the  discrete  distribution  law  to 
the  continuous,  on  the  contrary,  there  is  a  "diffusion"  of  the  proba¬ 
bility  initially  concentrated  In  the  individual  points  x^  into  the  cor¬ 
responding  subdomains  so  that  for  the  probability  density  function 
<p(x)  thus  appearing  the  following  relations  are  valid 

£  9  (jc)  dx  =  f  (xi)  (l  =  1,2 n ). 

Frequently  it  is  necessary  to  consider  the  infinite  sequences  of 
discrete  distribution  laws  f^(x)  (i  =  1,  2,  ...),  having  some  contin- 
yous  distribution  law  with  a  probability  density  function  cp(x)  in  the 
form  of  its  so-called  limit  distribution  law.  The  following  precise 
meaning  is  embedded  in  the  concept  of  the  limit  distribution.  First, 
the  domains  of  the  values  of  the  discrete  random  quantities  with 
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the  distributions  laws  f^(x)  converge  to  the  domain  of  the  values  M 
of  the  continuous  random  quantity  x*  In  the  cases  we  consider  this 
convergence  will  mean  that  all  are  contained  in  M,  and  for  any  ar¬ 
bitrarily  small  positive  number  e,  any  arbitrarily  large  number  N,  and 
for  any  point  x  from  M  in  each  of  the  sets  with  i  >  N  there  is  at 
least  one  point  removed  by  less  than  e  from  the  point  x*  Second,  for 
any  subdomain  P  of  the  domain  M  and  for  any  arbitrarily  small  positive 
number  6  for  all  numbers  i.,  beginning  with  some  number,  the  following 
inequality  must  be  satisfied 

£/,(*)!<  4.  (5°) 

*  «  *P  T  Ml  | 

The  summation  in  the  left  side  of  this  formula  is  taken  over  all 
the  points  from  contained  in  the  subdomain  P. 

The  concepts  of  the  mean  value  (mathematical  expectation)  of  the 
random  quantity  and  its  second  order  central  moments  are  of  great  ii  - 
portance  for  the  further  constructions. 

Let  x  =  (x^,  x2,  xfi)  be  an  n-dimensional  continuous  random 

quantity  with  the  probability  density  function  qp^,  x2,  ...,  xn).  As 
noted  above,  without  losing  generality  the  function  cp  can  be  consid¬ 
ered  determinate  over  the  entire  infinite  space.  Then  the  mean  value 
of  the  random  quantity  x  is  defined  as  the  vector  nT  =  (m^,  m2,  ...,  m  ) 
computed  from  the  equation 

,0*0* 

m  -  (m,,  . . m„)  =  J  J  .  .  .  j  .  ^  ^ 

....  Xn)dxxdXi.  .  .  dXn- 


The  second  order  central  moments  are  determined  by  the  equa 


tlons 


hk  =  J  J  ...  f  (Xi  —  —  m*)<p(x„  xt. 


.,x„)dxtdxt...  dxn 
(i,k  =  1.2 . n). 


(5?) 
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For  the  univariate  random  quantity  x  there  is  the  natural  second 
order  central  moment  determined  by  the  equation 

d=“\(x-m)\(x)dx.  (53) 

—9* 

This  moment  is  usually  termed  the  variance  of  the  corresponding 
distribution. 

It  is  natural  to  transfer  all  these  concepts  and  their  definitions 
to  the  discrete  random  quantities  as  well. 

To  do  this  we  need  only  in  equations  (5l)-(53)  replace  the  inte¬ 
gration  by  summation  extended  over  the  entire  domain  of  definition  M 
of  the  corresponding  discrete  (vector)  quantity  x,  and  in  place  of  the 
probability  density  function  ^(x.^,  x2,  . ..,  xn)  write  the  probability 
distribution  function  of  this  quantity  f(x).  As  a  result,  equation  (51 ), 
for  example,  is  rewritten 


m 


(5*0 


All  the  remaining  equations  are  changed  similarly. 

For  the  multivariate  random  quantities  it  is  convenient  to  com¬ 
bine  the  second  order  central  moments  Into  the  matrix  ||^ik||  and 
construct  from  them  (in  the  case  when  they  are  finite)  Q  (t^,  t2,  . . . 
tn)  orthogonal  conversion  (rotation) 


. . . , 


of  the  coordinate  system  this  form  can  always  be  reduced  to  the  form 


m 


(v  ')  ,  where  t£  are  new  coordinates  and  a ^  are  positive  coeffi- 

1=1 


cients  (i  —  1,  2,  ...,  m).  Forms  of  this  type  are  termed  positive  semi- 

definite.  If  m  =  n,  i.e.,  if  the  number  of  squares  after  reduction  of 
the  form  Q  Is  exactly  equal  to  the  dimension  of  the  space,  then  the 

form  Q  is  termed  positive  definite.  In  this  case  its  determinant 

| Q |  =  |^lkl  is  of  necessity  nonzero  and  (strictly)  positive. 

For  the  positive  definite  form  Q  =  Z  \  t.t,  we  can  define  the 

i,k  1  k 
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inverse  form  Q‘ 


-1  _ 


=  Q  (tj,  t2>  . ..,  tn)  =  Z  M-^t^t^,  whose  coeffi- 

i ,  k 


cients  are  given  by  the  equations  =  Qik;/|Q|,  where  Qik  is  the  alge¬ 
braic  complement  of  the  element  of  the  determinant  |Q|  -  |Xiv| 

(i,  k  =  1,  2,  n).  This  form  will  again  be  positive  definite.  The 

matrix  obtained  IIm-^II  will  obviously  be  the  inverse  of  the  matrix 
||Xlk||,  since  the  latter  is  symmetrical  (of  course,  the  matrix  IlM-^l). 

The  following  fundamental  result  [48]  is  of  great  importance  .n 
probability  theory.  ! 

4 

* 

Theorem  1.  If  the  n- variate  random  (continuous  or  discrete)  quan¬ 
tities  x^,  x2,  x^  are  independent  and  have  the  same  distribution 

with  finite  second  order  central  moments  X^  for  which  the  form  Q(t]  , 
t2,  . ..,  tn)  =Z  ^i^i^k  is  Posikive  definite,  and  with  mean  value 

1 ,  k  I 

equal  to  zero,  then  as  k  -♦  «>  the  quantity  x  =*t=Ui  +  xs  j-  a-*)  has  i 

V  k 

limit  continuous  distribution  also  with  zero  mean  value  and  with  the 
(univariate)  probability  density  function 


•  fit  •  •  •  i  /«) 


1 


„  _  J  *r,.  / . tn) 

e  2 


(2«) 3 V~\Q\ 


In  the  case  when  the  form  Q  is  degenerate,  we  turn  to  the  consid¬ 
eration  of  some  subspace  L  of  the  original  space,  replacing  the  random 

4 

f 

quantities  x^,  x2,  ...»  x^  by  their  projections  on  the  subspace.  The 
subspace  L  is  chosen  so  that  the  new  form  Q  of  the  central  moments job- 

i 

• 

tained  as  the  result  of  the  indicated  projection  is  already  positive 
definite,  while  the  projection  onto  any  subspace  perpendicular*  to  it 
would  lead  to  a  degenerate  form  (such  a  space  always  exists).  The  ap¬ 
plication  of  theorem  1  In  the  constructed  space  L  gives  in  this  case  a 
distribution  in  the  original  space  as  well,  since  all  possible  values 
of  the  random  quantities  x^,  x2,  ...,  x^  (and  this  means  their  sums  as 
well)  lie  in  this  subspace  with  the  probability  1. 
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In  several  cases  we  limit  ourselves  to  the  selection  of  the  sub¬ 
space  K  of  maximal  possible  dimension  of  the  number  of  all  subspaces 
with  the  nondegenerate  form  Q.  Let  us  show  that  the  subspace  L  can  be 
obtained  from  the  subspace  K  as  the  result  of  a  nondegenerate  linear 
transformation. 

If  in  the  conditions  of  theorem  1  the  mean  value  of  the  quantities 
Xj,  x2,  . ..,  x^  is  nonzero,  then,  denoting  this  mean  value  by  m  =  (m^, 
m2i  . m^),  we  find  easily  that  the  mean  value  of  the  random  quan¬ 
tity  x  =  1/Vk  (x1+x2+  . . .  +  x^)  will  be  the  quantity  (m  1«/k  m2«/k,  ..., 

. . . ,  m nVk)  and  that  with  sufficiently  large  k  a  good  approximation  for 
the  distribution  law  of  the  random  quantity  x  Will  be  the  continuous 
law  with  a  univariate  probability  density  function  of  the  form 


9(4.4 . tn) 


—  *  0  1  (/t — m,  Yk.t,~m,  Yk . In—mn  »'*)  . 

V  • 


(55) 


(2n)2j/[3| 

Let  us  now  apply  theorem  1  and  equation  (55)  to  the  so-called  bi¬ 
nomial  distribution.  The  binomial  distribution  arises  as  the  result 
of  the  conduct  of  independent  trials  using  the  so-called  Bernoulli 
scheme.  This  scheme,  in  addition  to  the  property  of  independence  of 
the  trials,  is  also  characterized  by  the  fact  that  with  each  trial 
only  two  outcomes  are  possible,  occurring  with  the  probabilities  jg 
and  q  =  1  —  p  respectively.  Let  us  term  the  first  outcome  success  and 
ther  second  failure  of  the  trial  and  let  us  introduce  the  random  quan¬ 
tity  x^,  taking  the  value  1  in  case  of  success  of  the  i-th  trial  and 
the  0  in  case  of  its  failure  (i  =  1,  2,  ...,  n). 

The  random  quantity  k  =  x^  +  x2  +  . . .  +  xfi  is  clearly  equal  to 
the  total  number  of  successes  with  n  independent  trials.  Let  us  denote 
by  Cn  the  number  of  combinations  of  n  with  respect  to  k  (by  definition 
Cn  =  then  it  is  e&sy  to  find  that  the  distribution  (discrete)  law 
of  the  random  quantity  k  is  given  by  the  function 
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M*>  C>V-‘  (*  —  0. 1 . /». 


(56) 


This  law  is  termed  the  binomial  distribution  law  since  the  quan¬ 
tity  C^pkq11"*^  is  obviously  the  (n  —  k  +  l)th  term  of  the  expansion  of 
the  expression  (p  +  q)n  using  the  Newton  binomial  formula. 

The  random  quantities  x^,  Xg>  . xn  have  the  same  distribution 
law  with  the  mean  value  m  =  1 «p  +  0» (l  -  p)  =  p  and  the  variance  (sec- 
ond  central  moment)  d  =  (l  —  p)  »p  +  (0  -  p)  (l  —  p)  =  p(l  -  p).  Prom 
theorem  1  and  equation  (55)  it  follows  that  for  sufficiently  large  n 
a  good  approximation  for  the  distribution  law  of  the  quantity  x  = 

=  k/Vn  =  +  Xg  +  . . .  +  xn)  will  be  the  distribution  law  with  the 

probability  density  function  of  the  form 


<P(*)  = 


l 


Or—  p  v  #0* 


2 PU-P)  • 


(57) 


V  2np(l  —  p) 

The  distribution  law  with  the  probability  density  function  of  the 

form  r7L=e~{JLr  (with  a  >  0)  we  shall  term  the  (generalized)  univar- 

V  2na  ~ 

late  normal  distribution  law.  It  is  not  difficult  to  see  that  the  value 
of  the  random  quantity  distributed  according  to  this  law  is  equal  to  m 
and  that  its  variance  is  equal  to  a. 

It  is  easy  to  see  that  with  multiplication  of  the  normally  dis¬ 
tributed  random  quantity  x  by  the  constant  factor  z  the  new  quantity 
y  =  cx  will  also  be  a  normally  distributed  random  quantity  and  its 


mean  value  will  be  c  times  larger  and  the  variance  c  times  larger  in 
comparison  respectively  with  the  mean  value  and  the  variance  of  the 
original  quantity  x* 

Comparing  the  results  obtained  with  equation  (57)  >  we  come  to  the 
following  proposition. 

Theorem  2.  With  a  sufficiently  large  number  n  of  Bernoulli  trials 
with  probability  of  success  js  the  distribution  law  for  the  total  num¬ 
ber  of  successes  k  can  be  approximately  expressed  by  the  normal  lav/ 
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...  1  -  (*-pn)' 

with  the  probability  density  function  of  the  form  ?(*) =  e  2np('~p)- 

The  mean  value  of  the  random  quantity  corresponding  to  this  law  is 
equal  to  pn  and  its  variance  :is  equal  to  np(l  -  p),  which  agree  with 
the  exact  values  of  the  mean  value  and  the  variance  of  the  original 
discrete  random  quantity  k. 

As  the  result  of  the  fact  that  in  the  derivation  of  the  statement 
contained  in  theorem  2  the-  quantity  x  was  multiplied  by  the  factor 
4h,  the  continuous  distribution  qp  for  the  quantity  k  obtained  in  theo¬ 
rem  2  does  not  possess,  generally  speaking,  the  property  of  the  limit 
distribution  for  the  original  (discrete)  distribution  f  of  the  quan¬ 
tity  k  with  unbounded  increase  of  the  number  of  trials  n. 

However,  it  is  not  difficult  to  note  that  with  sufficiently  large 
values  of  n  the  probabilities  calculated  in  accordance  with  the  dis¬ 
tributions  q>  and  f  of  finding  the  quantity  k  in  any  intergral  whose 
length  is  cf  the  order  of-  the  quantity  Vn  (i.e.,  has  the  form  c«/n, 
where  c  is  v,  constant)  wiU  differ  from  one  another  by  arbitrarily 
small  amounts. 

In  practice  we  usually  need  to  calculate  the  probability  of  find¬ 
ing  the  quantity  k  on  intervals  of  the  form  [pn,  pn  ±  zc],  where  the 
quantity  o  =  y/np  (l  — '  p),  equal  to  the  square  root  of  the  variance  of 
the  distribution  qp  (and  this  means  of  the  distribution  f  as  well),  is 
termed  the  mean  square  variation  (or  the  mean  square  error)  of  the 
distributions  qp  and  f.  The  following  theorem  is  valid. 

Theorem  3*  For  any  positive  number  z  the  probability  p(z)  that 
the  total  number  of  successes  in  n  Bernoulli  trials  with  a  probability 
of  success  £  will  be  found  in  the  interval  [pn,  pn  i  ^/np  (1  —  p) J ,  is 

expressed  by  the  equation  p(z)  «  $(z)  =  1/-/27T  x  L~  i  d*  •  With  any  z, 

6  — 

by  choosing  n  sufficiently  large,  we  can  make  the  error  in  the  calcu¬ 
lation  of  p(z)  using  this  equation  arbitrarily  small. 
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We  show  the  numerical  values  of  the  function  $>(z)  for  some  values 
of  x  with  four  decimal  places:  $  (l)  =  0.3413;  $  (2)  =  0.4772;  <!>  (3)  = 

=  0.4986;  for  z  4  the  values  of  $(z)  differ  from  0.  9000  by  less  than 
half  a  unit  of  the  fourth  decimal  place. 

We  term  the  approximate  equation  in  the  condition  of  theorem  3 
the  de  Moivre-Laplace  formula.  As  indicated  in  theorem  3>  the  accuracy 
of  this  formula  increases  with  one  increase  of  the  number  n  of  trials 
performed. 

Let  us  consider  a  series  of  independent  trials,  each  of  which  has 
m  different  outcomes,  and  let  p^  (p^  >  0)  denote  the  probability  of  the 
1-th  outcome  of  the  trial  (i  =1,  2,  ...  m).  We  denote  the  total  number 
of  trials  conducted  by  the  letter  n  and  the  number  of  those  which  ter¬ 
minated  with  the  1-th  outcome  —  k^i  =  1,  2,  ...,  m).  It  is  easy  to 
see  of  the  quantities  k^,  considered  by  itself,  is  distributed  in  ac¬ 
cordance  with  the  binomial  law.  We  pose  the  problei  of  finding  the 
joint  (multivariate)  distribution  law  of  several  quantities  k^,  for  ex¬ 
ample  the  quantities  k^,  k^,  kp  (l  <  r  <  m).  It  is  not  difficult 

to  verify  that  the  solution  of  this  problem  is  given  by  the  equation 

f  (ki<  ^2.  •  •  •  >  kr)  — 

n ! 

..k,\(n  —  kt  X  (58) 

x  pf'p*'p*  ( 1  —  p,  —  p2  —  . . .  -  pr)  "-*•-**-  •••“*»■ 

For  this  distribution  law,  which  is  customarily  termed  the  poly¬ 
nomial  distribution  law,  we  can  obtain  a  continuous  approximation  just 
as  we  did  above  for  its  particular  (univariate)  case.  To  do  this  let 
us  consider  the  multivariate  random  quantities  x1^  =  (x^,  xjj,  x£)f 

such  that  the  quantity  x^  takes  one  of  the  values  (100  ...  0),  (010  ... 

. ..  0),  ...,  (00  ...  01)  or  (000  ...  00)  In  accordance  with  the  out¬ 
come  of  the  j-th  trial  of  the  series  we  consider  -  first,  second,  ..., 
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r-th  or  any  different  from  the  (j  =  1,  2,  .  ..,  n).  All  the  random 
quantities  have  the  same  distribution  law,  their  mean  values  are 
equal,  obviously,  to  (p^,  p2,  ...,  pp).  The  second  central  moment  X^ 
is  equal,  obviously,  to  the  quantity  (l  -  p^  +  (0  -  (l  —  pi)  = 
=  pi(l  —  p1)  (i  =  1,  2,  ...,  r).  The  second  central  moment  X lk  with 
i  /  k  also  is  easily  calculated:  X lk  =  (l  -  p^)  (0  -  +  (0  —  p.^) 

(1  “  Pk)  P  +  (0  -  pi)  (0  -  pk)  (1  -  Pi  -  Pk)  =  -  PjPk. 

The  determinant  |^-lkl  of  the  matrix  ||^lk||  of  the  central  moments 
will  be  equal  in  this  case,  as  is  easily  shown,  to  the  product  P-^Pg 
...  pr(l  —  p1  —  p2  —  ...  -  pr).  Thus,  the  matrix  ||Xlk||  will  be  degen¬ 
erate  only  in  the  case  when  r  =  m.  In  all  the  remaining  cases  the  qua¬ 
dratic  form  Q{*  /, . tr)  =  IX/*  tit„  -  £pi  (1  —  PipM*  will  be  positive 

definite,  since  its  determinant  |Q|  =  |^-lkl  is  positive. 

Applying  theorem  1  to  the  random  quantity  y  =  1/Jh(y^  +  yg  +  . . . 

+  yn),  where  y1  (x*  —  p  ,  x2  —  pg,  ...,  xj;  -  pr),  we  come  to  the  con¬ 
clusion  that  with  r  <  m  it  has  a  limit  (as  n  -+  °°)  distribution  law  with 

a  probability  density  function  of  the  form 

_ ! _ <,,•  h . U)  • 

(2n)T^jQT| 

The  multivariate  random  quantity  z  =  (k^/n  —  p1,  k2/n  -  p2  ...,  kr/n  - 
—  p^)  is  connected  with  the  quantity  ^  by  the  relation  z  =  l/«/n  =  y 
and  will  therefore  have  the  same  distribution  law,  but  with  a  variance 
n-fold  less  than  the  variance  of  the  quantity  Consequently,  the 
probability  density  function  of  the  distribution  law  for  z  is  written 


*  <*i.  *» . »,). 


It  is  now  not  difficult  to  establish  the  following  result. 
Theorem  4.  Let  there  be  given  the  series  of  independent  trials 
with  m  different  outcomes,  having  resprctively  the  probabilities 
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P1?  p2>  . Pm  (same  for  all  trials).  If  in  the  series  of  n  trials 
we  use  k^,  k^»  . to  denote  the  number  of  trials  terminating  re¬ 
spectively  by  the  1st,  2nd,  ...,  m-th  outcome,  then  for  any  a  priori 
given  positive  number  e  for  the  probability  p  of  the  s imultaneous  sat¬ 
isfaction  of  all  the  inequalities  Ik^/n  —  p^|  <  e  (i  =  1,  2,  in) 

t  a 

there  exists  the  estimate  <?>  1  e~hn  (where  a  and  b  are 

n  t 

positive  constants  not  dependent  on  n). 

For  the  proof  of  this  theorem  we  note,  first,  that  all  the  in¬ 


qualities 


— Pl  |  <  e  (/=  1,  2.  m)  are  obviously  satisfied  if  there  are 


satisfied  the  m  -  1  inequalities 


k, 

—  —  Pi 

n 


< 


m  —  1 


(i  =  1.2 . m  —  1) 


(59) 


Actually 


km 

-~pm 

1! 

fX  ~~~  "  ...  1  I 


(1  —  Pi  —  Pi  — ... 


Pm—  I ) 


+ 


"■|(^-P.)  +  (^-P.)+  •  •  •  + 

- />*-.) | C  — T (m  —  !*-«• 

It  follows  from  the  considerations  preceding  the  formulation  of 
theorem  4  that  the  quantities  z^  =  k1/n  —  p^ ( 1  =  1,  2,  . m  —  l) 
have  a  limit  (as  n  -*■  °°)  distribution  law  with  the  probability  density 
function  of  the  form  ^(2{t  2 . 2m_x)  =anT^'e-n^-  *• . >  where  a  is  a  pos¬ 

itive  constant  and  P  Is  a  positive  definite  quadratic  form  with  coeffi¬ 
cients  nob  depending  on  n  For  sufficiently  large  n  the  probability  p 
that  at  least  one  of  the  inequalities  (59)  is  not  satisfied  has,  ob¬ 
viously,  an  upper  estimate  of  the  form 


P  ^  ^  <p  (Z\,  z*f  •  •  i  Zm— i)  dztdz«  .  .  .  dZm— I.  (60 ) 

^1 

where  the  region  is  the  outer  portion  of  the  hypercube  bounded  by 
the  hyperplanes  z  =  6^  (6^  <  e/m  -1,  i  =  l,  2,  . rn  -  l). 
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After  rotation  of  the  coordinate  axes  for  the  purpose  of  reducing 
the  form  P  to  the  sum  of  squares  with  positive  coefficients  b^,  b2* 

. ..,  b  ^  we  can  in  the  hypercube  turned  relative  to  the  new  axes  in¬ 
scribe  the  new  hypercube  R,  bounded  by  the  hyperplanes  z£  =  6  (6  <  6^, 
i  =  1,  2,  . ..,  m  -  l).  Integration  of  the  transformed  probability  u  . 
sity  function  over  the  region  external  to  the  new  hypercube  gives  again 
an  estimate  of  the  foim  (60),  which  can  be  strengthened  by  replacing 
all  the  coefficientd  b, ,  b0,  h,  ,  by  the  smallest  coefficient 

among  them,  designated  by  £. 

As  a  result  we  obtain  the  new  estimate 


Since 


P<an 


c  xax 


6  2  gn 


(61) 


it  is  easy  to  obtain  the  final  estimate 

p< - 

(62) 

ni“l  2 

Denoting  a/(g6)  by  the  letter  a  and  gS*  by  the  letter  b,  we  ob¬ 
tain  the  required  estimate,  for  the  present,  it  is  true,  for  all  n  be¬ 
ginning  with  some  possibly  quite  large  value.  We  can,  however,  also 
take  account  in  the  derived  estimate  of  the  remaining  finite  set  M  of 
values  of  n  by  increasing,  in  case  of  necessity,  the  constant  a.  Since 
the  probability  p  is  clearly  greater  than  zero,  it  is  sufficient  to 
select  the  quantity  a  larger  enough  so  that  the  right  side  of  the  esti¬ 
mate  under  discussion  becomes  negative  for  all  values  of  n  belonging 
to  the  set  M. 

Thereby  theeorem  4  is  fully  proved. 

In  conc3uding  the  present  section  we  shall  describe  still  another 


t 


frequently  encountered  distribution  -  the  so-called  Poisson  distribu¬ 


tion.  This  distribution  can  be  treated  as  an  approximation  for  the  dis¬ 
tribution  with  the  condition  than  the  number  of  trials  n  is  large,  the 
probability  of  success  p  in  each  trial  is  small,  and  the  product  X  -  np 
is  not  small,  but  also  is  not  large.  In  this  case  the  probability  a 
that  exactly  k  trials  lead  to  success  is  expressed  by  the  approximate 
equation 


In  particular,  for  k  «0  azse-K  . 

The  Poisson  distribution  has  a  maximum  of  the  probability  with  a 
maximal  value  of  k  satisfying  the  inequality  k  <  X.  In  the  theory  of 
discrete  self-organizing  systems  we  encounter  the  Poisson  distribution 
in  the  organization  of  teaching  automata  words  or  sequences  of  words 
of  differing  length.  With  a  random  selection  of  the  words  being  used 
in  the  teaching,  the  Poisson  distribution  frequently  gives  a  sufficient¬ 
ly  good  approximation  for  the  distribution  law  of  the  word  lengths. 

We  note  that  the  mean  value  of  a  quantity  having  a  Poisson  dis¬ 
tribution  (63)  is  equal  to  X. 

§3.  A  QUANTITATIVE  MEASURE  OF  SELF-ORGANIZATION  AND  SELF-IMPROVEMENT 
IN  AUTOMATA 

In  the  first  section  we  encountered  the  concept  of  self-adapta- 
tion  in  automata:  it  is  natural  to  term  automaton  self-adaptive  if  it 
changes  in  the  course  of  time  its  responses  to  the  questions  fed  to  it 
(for  some  cycling  of  the  input  and  output  information) .  However,  not 
every  self-adaptation  should  be  identified  with  self-organization.  On 
the  basis  of  the  intuitive  idea  of  self-organization,  we  should  term 
self-organizing  that  automaton  which  improves  the  organization  of  its 
possible  learning  histories.  For  the  quantitative  characteristic  of 
this  improvement  it  is  natural  to  make  use  of  the  probabilistic-theo- 
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retie  concept  known  as  entropy. 

We  shall  use  the  entropy  concept  only  for  the  discrete  random 
quantities.  Let  there  be  given  the  discrete  random  quantity  x  with  the 
domain  of  definition  R  and  with  the  distribution  law  f(x).  In  this  case 
the  entropy  of  this  quantity,  or,  what  is  the  same,  the  entropy  of  the 
distribution  f(x),  is  the  tenn  given  to  the  negative  sum,  taken  over 
the  region  of  definition  R  of  the  given  random  function,  of  the  pro¬ 
ducts  of  the  probabilities  f(x)  and  their  logarithms 

//«- EM*)  »<*/(*)•  (64) 

In  the  use  of  this  equation  it  is  assumed  that  for  f(x)  =  0  the 
product  f(x)  log  f(x)  is  zero.  Any  positive  number,  strictly  greater 
than  unity,  can  be  selected  as  the  base  of  the  system  of  logarithms.  It 
is  easy  to  see  that  with  a  change  of  the  base  of  the  logarithm  system 
the  values  of  the  entropies  for  all  the  distribution  laws  are  multi¬ 
plied  by  the  same  constant  factor.  In  practice,  use  is  commonly  made 
either  of  the  binary  (with  base  two),  natural,  or  decimal  logarithms. 

As  is  shown  in  information  theory  (see,  for  example,  Goldman  [32]), 
entropy  is  the  natural  measure  of  the  indefinx^eness  of  the  values  of 
the  random  quantity:  the  greater  this  indefiniteness,  the  larger  the 
value  of  the  entropy.  In  particular,  if  the  random  quantity  can  take 
only  two  values  with  the  probabilities  £  and  q  =  1  —  p  respectively, 
the  maximal  value  of  the  entropy  Is  achieved  with  equality  of  these 
probabilities:  p  =  q  =  1/2,  which  corresponds  to  the  intuitive  concept 
on  the  maximal  possible  indefiniteness  in  this  case.  If,  however,  one 
of  the  probabilities  £  or  3  vanishes,  then,  as  is  easily  seen,  the  val¬ 
ue  of  the  entropy  also  vanishes,  which  again  is  in  good  agreement  with 
common  sense,  since  in  this  case  there  is  actually  no  indefiniteness. 

With  combination  of  the  two  Independent  random  quantities  x  and 


jr  into  one  multivariate  (with  dimension  equal  to  the  sum  of  the  dimen¬ 
sions  of  the  quantities  x  and  random  quantity  z  =  (x,  y),  the  en¬ 
tropy  of  the  distribution  of  the  quantity  z  is  equal  to  the  sum  of  the 
entropies  of  the  distributions  of  the  quantities  x  and  jr. 

Actually,  if  the  distribution  lav/s  of  the  quantities  x  and  y  are 
given  by  the  functions  f-^Cx)  and  f2(x),  then  the  distribution  law  of 
the  quantity  z  is  obviously  given  by  the  product  of  these  functions 
f^(x)f2(y).  The  entropy  of  the  quantity  z  (Hz)  is  then  calculated  from 
the  equation 

H,  =  -  E  E  /,  (X)  U  (U)  log  (/,  (X)  ft  (</)!=* 

jrePi  ytPt 

=  —  £  ft  (y)  E  f i  ( x )  log  /,  (x)  —  E  f,  (x)  E  ft  (//)  log  ft  (//)  -  Hx  4-  //„. 

*«/>,  xe/1! 

where  and  P2  are  the  regions  of  definition  of  the  quantities  x  and 
v,  and  H  and  H  are  their  entropies. 

x  y 

The  property  of  the  entropies  of  the  independent  distributions 
which  leads  to  the  formation  of  their  sum  when  these  distributions  are 
combined  into  one  is  termed  the  entropy  additivity  property. 

For  the  automata  operating  using  the  simple  question-response 
cycle  (without  evaluation  of  the  quality  of  the  response),  we  can  ap¬ 
proach  the  definition  of  the  extent  of  the  self-organization  with  the 
aid  of  the  consideration  of  two  entropy  characteristics  —  the  learning 
entropy  and  the  examination  entropy  of  the  automaton.  In  the  following 
discussion  we  shall  follow  basically  the  work  [25]. 

Let  (p^,  p 2,  ...,  p^)  =  P  be  the  sequence  of  questions  (input 
words)  supplied  to  the  automaton  in  its  learning  period.  We  will  term 
this  sequence  the  learning  sequence.  Let  us  assume  that  in  a  particular 
fixed  series  of  experiments  with  the  automaton,  for  each  learning  se¬ 
quence  P  there  is  given  the  probability  p(P)  of  the  appearance  of  this 
sequence  in  the  experiments  of  the  series  under  consideration  (it  is 
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assumed  that  within  the  limits  of  the  given  series  this  probability 
does  not  vary  from  experiment  to  experiment).  This  specifies  some  dis¬ 
tribution  R  of  the  probabilities  p(P)  of  the  learning  sequences.  The 
entropy  of  this  distribution,  which  we  shall  tern  the  learning  entropy 
with  the  given  learning  distribution  law,  is  calculated  from  the  famix- 
iar  equation 

HR  (learn)  -  —  Eq(P) log q(P).  (65) 

p 

For  definiteness  we  agree  to  use  natural  logarithms  for  the  com¬ 
putation  of  the  entropies. 

In  the  case  when  the  automaton  A  and  its  initial  state  aQ  are 
fixed,  every  distribution  of  the  probabilities  p(P)  of  the  learning 
sequences  uniquely  determines  some  distribution  of  the  probabilities 
a(a)  on  the  set  of  all  states  of  this  automaton.  Here  a(a)  denotes  the 
probability  that  after  termination  of  the  automaton  learning  process  it 
will  be  in  the  state  a.  If  we  use  SQ  to  denote  an  event  at  the  input 
of  the  automaton,  representable  by  the  state  a  (set  of  input  words 
transferring  the  automaton  from  the  initial  state  into  the  state  a), 
then  we  obtain 

a(fl)=  Sun  (66) 

PtSg 

where  the  summation  extends  to  all  words  of  the  form  P  =  p^  p2  ...  pn, 
contained  in  S_  (for  brevity  of  writing,  the  sequence  of  words  P  = 

=  (Pl>  P2>  •••»  Pn)  is  identified  here  with  the  word  p2  ...  pn,  com¬ 
posed  from  the  elements  of  this  sequence). 

Now  let  us  fix  some  probability  distribution  y(p)  of  the  questions 
2  applied  to  the  automaton  after  termination  of  its  learning  process. 
The  distributions  a (a)  and  y(p)  together  with  the  switching  and  output 
functions  of  the  considered  automaton  A  uniquely  define  the  probability 
distribution  0(p,  q)  for  the  pair  "question  (p)  -  answer  (q)".  We  term 
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the  entropy  of  the  latter  expression  the  examination  entropy  and  denote 
It  by  H^(exam).  We  use  Q  to  denote  the  so-called  law  of  experimentation 
with  the  automaton,  which  is  the  combination  of  the  two  distribution 
laws  —  the  distribution  of  the  learning  sequences  and  the  distribution 
of  the  examination  questions. 

The  quantity  H^(exam)  is  determined  from  the  equation 

(exam)  =  —  £p(p,?)logP(p,t/).  (67) 

Using  if  necessary  the  operation  of  cyclic  reduction  of  the  autom¬ 
ata,  we  can,  without  losing  generality,  consider  only  sequences  of 
single-letter  questions  and  responses.  Here  the  learning  sequences  P 
are  converted  into  words  consisting  of  the  individual  components  of 
their  question-letters  arranged  in  the  order  in  which  they  were  applied 
to  the  automaton  in  the  learning  process. 

For  the  further  construction  of  the  theory  it  is  necessary  to  spec¬ 
ify  some  class  of  laws  of  experimentation  with  the  automaton  and  as¬ 
sign  to  each  law  Q  occurring  in  this  class  some  probability,  or,  in 
the  case  of  the  continuous  distribution  laws,  some  probability  density 
<p(Q). 

The  simplest  case  is  the  scheme  of  independent  trials,  when  at 
each  step,  both  in  the  learning  regime  and  in  the  examination  regime, 
the  probability  y(P)  of  the  appearance  of  any  given  question  is  con¬ 
stant  and  depends  only  on  this  question.  In  view  of  the  limitation  to 
only  1-cycled  automata,  the  specification  of  the  law  Q  for  experimenta¬ 
tion  with  the  automaton  is  equivalent  in  this  case  to  the  assignment 
of  certain  probabilities  =  v(x^)  of  the  appearance  at  the  input  of 
the  automaton  of  each  of  the  letters  x^  of  its  input  alphabet.  The  sum 
of  all  the  v^,  of  course,  must  be  equal  to  unity  in  this  case. 

In  the  case  of  the  scheme  of  independent  trials  the  law  of  experi- 
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mentation  with  the  automaton  is  naturally  identified  with  the  vector 
v  =  (v-p  v2,  . ..),  consisting  of  the  probabilities  of  the  appearance 
of  the  different  input  letters  (it  is  assumed  that  the  input  alphabet 
is  ordered  in  some  fashion).  The  class  of  laws  is  naturally  identified 
with  the  set  of  all  vectors  v  =  (v^,  Vg,  ...),  satisfying  the  natural 
limitations  0  £  <£  1  and  Zv^.  »  1  (i  =  1,  2,  . ..),  with  a  uniform  dis¬ 

tribution  law  given  on  this  set.  We  agree  to  call  the  scheme  of  inde¬ 
pendent  trials  with  this  selection  of  class  of  distribution  law  the 
uniform  scheme  of  independent  trials.  Here  we  limit  ourselves  to  the 
case  when  the  length  of  the  learning  sequence  is  fixed,  or  in  case  of 
necessity  we  shall  assume  that  these  lengths  are  described  by  some  dis¬ 
tribution  law  (most  frequently  Poissonian). 

If  there  is  given  some  law  of  experimentation  Q,  then  it,  as  noted 
above,  includes  in  itself  two  distribution  laws  -  the  law  of  distribu¬ 
tion  of  the  learning  sequences  and  the  law  of  distribution  of  the  ex¬ 
amination  questions.  The  corresponding  random  quantities  are  to  be  con¬ 
sidered  independent  in  the  case  of  the  usual  organization  of  the  ex¬ 
periments  on  the  self- improving  automata.  Therefore  the  entropy  of  the 
Joint  distribution  of  these  two  quantities,  which  we  shall  agree  to 

term  the  entropy  of  the  corresponding  law  of  experimentation  Q  and  de- 
0 

signate  by  H  ,  will  be  equal  to  the  sum  of  two  entropies  —  the  learn¬ 
ing  entropy  H^(learn)  and  the  entropy  of  the  examination  questions 
H^(quest).  The  latter  entropy  must  not  be  confused  with  the  examina¬ 
tion  entropy  H^(exam)  which  relates  not  to  the  distribution  of  the  ex¬ 
amination  questions,  but  to  the  distribution  of  the  question-response 
pairs.  The  examination  entropy  depends  not  only  on  the  distribution  of 
the  questions  and  the  distribution  of  the  learning  sequences,  but  also 
on  the  automaton  itself,  while  the  entropy  of  the  law  of  experimenta¬ 
tion  with  the  automaton  does  not  depend  on  the  automaton. 
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Let  us  assume  that  in  the  class  K  of  laws  of  experimentation  with 
the  automaton  there  is  fixed  some  law  Qq  which  has  the  maximal  possible 
entropy  hqQ.  Introducing  increments  of  the  entropies  of  experiment  and 
examination  by  the  equations 

AH^  =  —  H^°(exam)  =  (exam)  —  H^°(exam). 

we  obtain  the  possibility  for  any  automaton  A  and  class  K  of  laws  of 
experimentation  with  the  automaton  (with  the  probability  density  cp (Q ) ) 
to  introduce  the  two  averaged  characteristics 

s (A, K)  =*  —  f  AHC  (exam)  <p«?)dQi  (69) 

k 

.j.  fAJP(exam)  /vn\ 

2(A,  K)  «  J  — V  (i)dQ.  w  0  ) 

The  integrals  in  these  equations  are  taken  over  the  region  con¬ 
sisting  of  all  the  laws  of  the  considered  class  K.  The  larger  the  val¬ 
ue  of  these  integrals,  the  greater  the  average  capacity  of  the  consid¬ 
ered  automaton  A  for  self-organization.  The  zero  value  corresponds  to 
the  absence  of  capability  for  self-organization,  and  negative  values 
indicate  that  with  improvement  of  the  organization  of  the  learning, 
the  organization  of  the  responses  of  the  automaton  on  the  average  de¬ 
teriorates.  In  other  words,  the  automaton  behaves  as  a  "self-disorgan¬ 
izing"  system  rather  than  as  a  "self-organizing"  system. 

Since  equation  (70)  leads  to  considerably  more  complex  computa¬ 
tions  than  equation  (69) »  we  shall  select  as  the  basic  quantitative 
criterion  for  the  evaluation  of  the  capability  of  an  automaton  for 
self-organization  the  criterion  _s  (A,  K)  rather  than  the  criteria 
z  (A,  K). 

Let  us  consider  as  an  example  the  two  automata  A  and  B  whose 
switching  and  output  functions  are  given  by  the  tables: 
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J  i. 


II  2  II  2 
for  automaton  A  J,-,-.  j— . 

y  2  2  y  |a  v 

II  2  |1  2 

for  autanaton  B  FT  2T;  x  \u~  v . 

y  2  2  y  ]u  v 

In  these  tables  the  numbers  1  and  2  denote  the  states  of  the  au¬ 
tomata,  the  letters  x,  £  denote  the  Input  signals  (questions),  and 
letters  u,  v  denote  the  output  signals  (responses). 

As  the  class  K  of  distribution  laws  we  select  that  class  In  which 
the  probabilities  of  the  occurrence  of  the  examination  questions  x  and 
jr  are  equal,  and  the  distribution  laws  of  the  learning  sequences  result 
from  the  scheme  of  Independent  trials  with  the  probabilities  of  the 
independent  trials  with  the  probabilities  of  the  occurrence  of  the 
questions  x  and  £  equal  to  £  and  1  -  p  respectively  (_g  runs  through  all 
the  values  from  0  to  1  in  the  limits  of  the  class  K  with  equal  proba¬ 
bilities).  In  addition,  we  fix  the  length  n  of  the  learning  sequences, 
and  we  denote  the  criterion  s(A,  K)  corresponding  to  the  selected  val¬ 
ue  of  n  by  sn(A,  K). 

The  automaton  A  will  be  in  the  state  1  if  the  last  question  given 
to  It  during  learning  was  x,  and  in  the  state  2  if  the  last  question 
given  it  was  jjr.  This  Implies  that  the  probabilities  of  the  question- 
response  pairs  will  be  equal:  for  the  pair  (x,  u)  -  1/2  p,  for  the 
pair  (y,u)— 1/2  p,  for  the  pair  (x,  v)  -  1/2  (1  -  p)  and  for  the  pair 
(y,  v)  -  also  1/2  (l  —  p).  Consequently,  the  examination  entropy 
HQ  (exam)  -- Ipln^p -^pln-ip --1(1  -  p)ln-I(l -p)  - 

— g-(I  —  P)ln-^(1—  p)=*  -plnp-(l  -p)ln(l  -p)-  In  - 
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The  maximal  entropy  of  the  experimentation  will  obviously  be  with 
p  =  1  —  p  =  1/2.  In  this  case  the  examination  entropy  is  determined  by 
the  expression 

//o.  (exam)  =  —  — -ln-2  —  In-j  =»  —  2ln-^. 


The  increment  of  the  examination  entropy 

A HQ  ( exam )  =  —  pin  p  —  (1  —  p)  In  (1  —  p)  +  In  -g-. 

The  probability  density  of  the  laws  Q,  in  the  selected  class  is 
clearly  equal  to  unity.  Application  of  equation  (69)  gives 


1 

M.  K)  —  |*  ^P  In  p  r  (1  —  p)  !n  ( 1  -  p)  —  In  ~  j dp  ^  !n  2  — 

—  j-~0. 19. 

The  automaton  B  will  be  in  the  state  1  only  when  the  learning  se¬ 
quence  consists  of  only  x’s.  The  probability  of  this  is  obviously  pn. 
Hence  the  probabilities  of  the  examination  pairs  (x,  u)  and  (y,  u)  are 
equal  to  1/2  pn,  and  the  probabilities  of  the  pairs  (x,  v)  and  (y,  v) 
are  equal  to  1/2  (l  -  pn).  Just  as  in  the  case  of  finding  sn(A,  K),  we 
find  AH^  (exam),  as  a  result  of  which  we  obtain  the  sequence  of  equa¬ 
tions 


1 

sn(B,K)  -  J|p"ln-*-p"  -f  (1  -  p-)ln^(l  -p") — ~  in 
*l"(>  — -jr)  +  [<■  -o")1"*1  — 5nF,y»  ~ 


1 

r  1 

1 

n 

k'^)i 

[2  +  i)  2(2  + 

• 

'  .  1 . 

+...). 

[  I-2-22"  ’  2-3-23n 

Using  this  last  relation  we  obtain  the  estimate 
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(n  -r  1)* 


4 n 


Prom  this  estimate  we  easily  learn  that  with  n  £  5  the  quantity 
sn  (B,  K)  is  negative.  In  other  words,  with  learning  using  scqa  ..ces 
of  length  greater  than  4,  in  the  selected  class  of  laws  of  experiiiu  sta¬ 
tion  the  automaton  B  is  on  the  average  "self-disorganizing",  while  the 
automaton  A  under  the  same  conditions  discloses  capability  for  self¬ 
organization. 

We  note  that  the  conclusion  on  the  capability  or  the  incapability 
of  the  autcmton  for  self-organization  depends  on  the  selection  of  the 
class  of  laws  of  experimentation  with  this  automaton.  If,  for  example, 
In  the  example  considered  we  select  as  K  the  class  of  laws  of  experi¬ 
mentation  which  results  from  the  unifom  scheme  of  independent  trials, 
then  as  Is  easily  verified,  the  automaton  B  would  also  become  self-or¬ 
ganizing  on  the  average,  although  the  magnitude  of  this  self-organiza¬ 
tion  would  remain  less  than  that  of  the  automaton  A. 

With  transition  from  the  concept  of  self-organization  to  the  con¬ 
cept  of  self-learning  we  can  no  longer  be  satisfied  with  the  purely 
probabilistic-theoretic  concepts.  It  is  necessary  to  introduce  the  con¬ 
cepts  which  characterize  the  particular  directionality  of  the  self-or¬ 
ganization  process.  To  do  this  It  is  most  natural  to  introduce  the  real 
function  f(p,  q)  defined  on  the  set  of  all  possible  question  (q)  -  re¬ 
sponse  (q)  pairs,  whose  value  characterizes  the  quality  of  any  response 
to  any  given  question  jd. 

As  we  noted  above,  for  any  given  automaton  A  with  fixed  initial 
state  aQ  the  specification  of  the  law  Q  of  the  probability  distribu¬ 
tion  p(P)  on  the  learning  sequences  P  uniquely  determines  the  proba¬ 
bility  distribution  a(a)  on  the  set  of  the  automaton  states.  Let  us 
further  denote  by  q  *  X(a,  p)  the  response  of  the  automaton  A,  which 
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has  been  first  reduced  to  the  state  a,  to  the  question  2*  The  quantity 
fQ  =  Lf(p,  Ma,  p.')  y(p>  o(fl)  is  the  averaged  criterion  of  the  quality 

P.0 

of  the  responses  of  the  automaton  to  the  "examination'1  when  it  has  been 
taught  by  the  sequences  distributed  according  to  the  law  Q  (7  (p)  is 
the  probability  of  the  appearance  of  the  question  £  in  the  examination). 

It  is  natural  to  use  the  tern  self-learning  content  of  the  autom¬ 
aton  A  to  denote  the  difference  f^  -  f^D  where  Qq  is  the  a  priori  prob¬ 
ability  distribution  law  of  the  learning  sequences,  known  to  the  de¬ 
signer  at  the  time  of  construction  of  the  automaton,  and  Q  is  the  a 
posteriori  distribution  law  which  actually  exists  for  some  class  of 
learning  experiments.  As  a  rule,  the  entropy  of  the  distribution  Qq  is 
greater  than  the  entropy  of  the  distribution  Q. 

If  now  there  is  given  the  class  K  of  a  posteriori  distribution 
laws  Q,  with  the  probability  density  cp(Q),  then  the  integral  b  (A,  K)  = 

«=  J  (fQ_  /<?.)  <p(Q)  dQ  is  the  averaged  quantitiative  characteristic  for  the 

Q»K 

capability  of  the  considered  automaton  for  self-learning  (for  the  se¬ 
lected  class  K,  the  automaton  A  and  the  real  function  f). 

§4.  AUTOMATA  WITH  RANDOM  TRANSITIONS 

In  addition  to  the  determinate  automata,  in  the  theory  of  self¬ 
organizing  systems  we  must  consider  automata  which  have  random  tran¬ 
sitions.  As  is  known  (see  Chapter  3)#  in  the  determinate  automaton  the 
specification  of  the  preceding  state  a(t  -  l)  and  the  current  input 
signal  x(t)  uniquely  determines  the  next  following  state  a(t)  into 
which  the  automaton  transfers  under  the  influence  of  this  input  signal 
from  the  state  a(t  —  l).  In  the  automaton  with  random  transitions  the 
specification  of  the  pair  a(t  -  l),  x(t)  determines  only  the  proba¬ 
bility  p.  .(x)  of  the  transition  of  the  automaton  from  the  state  a(t  -  l), 

J 

which  we  denote  by  a^,  into  any  other  state  under  the  influence  of 
the  input  signal  x(t)  =  x. 
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It  is  easy  to  see  that  the  determinate  automaton  can  be  consid¬ 
ered  as  a  particular  case  of  the  automaton  with  random  transitions  in 
which  for  each  x  with  any  given  i  only  precisely  one  of  the  probabil¬ 
ities  p ^ j (x )  is  equal  to  unity,  and  all  the  remaining  probabilities 
are  equal  to  zero. 

It  is  natural  to  specify  every  automaton  with  random  transitions 
with  the  aid  of  the  system  of  matrices  Ifp^  (x)||,  where  x  runs  sequen¬ 
tially  through  all  the  input  signals  of  the  automaton.  Of  course,  in 
addition  to  such  matrices  there  must  also  be  given  the  output  functions 
and  the  initial  state  of  the  automaton. 

The  matrices  j|p1j(x)||  have  the  property  that  the  sum  of  the  ele¬ 
ments  of  any  of  their  rows  is  equal  to  unity.  We  shall  assume  also  that 
there  are  no  states  in  the  automaton  for  which  the  probabilities  of 
the  transition  from  all  the  other  states  are  equal  to  zero.  This  means, 
obviously,  that  the  matrices  ||p1j(x)||  do  not  have  columns  composed  only 
of  zeros.  In  addition,  all  the  elements  of  each  of  the  matrices 
HPljMlI  are  nonnegative  real  numbers  which  do  not  exceed  unity. 

Matrices  satisfying  the  three  listed  properties  are  customarily 
termed  stochastic  matrices.  Thus,  in  the  case  c>f  automata  with  random 
transitions  the  role  of  the  switching  function  is  played  by  the  func¬ 
tion  Hp^  (x)  ||,  which  uniquely  maps  the  set  of  all  input  signals  of  the 
automaton  into  the  set  of  stochastic  matrices. 

Of  particular  interset  are  the  automata  with  random  transitions 
which  have  one  single  (constant)  input  signal.  Such  automata  are  stud¬ 
ied  in  classical  probability  theory  under  the  name  of  uniform  (dis¬ 
crete)  Markov  chains.  The  output  signals  in  such  automata  are  ignored 
(or  identified  with  the  states),  which  permits  specifying  these  autom¬ 
ata  with  the  aid  of  a  single  stochastic  matrix.  For  definiteness,  it  is 
customarily  considered  that  the  first  row  (and  the  first  column  as 
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well)  of  this  matrix  corresponds  to  the  initial  state  of  the  automaton 
(Markov  chain). 

The  Markov  chains  also  have  another  (non-automaton)  interpreta¬ 
tion  —  in  the  terms  customarily  used  in  probability  theory.  This  inter¬ 
pretation  is  based  on  the  concept  of  trials,  considered  in  the  preced¬ 
ing  section.  However,  here  we  must  consider  not  independent  trials, 
but  the  trials  in  which  the  probabilities  of  particular  outcomes  of 
each  successive  trial  depend  on  the  outcome  of  the  directly  preceding 
trial  and  do  not  depend  directly  on  the  outcomes  of  all  the  remaining 
trials  (the  set  itself  of  possible  outcomes  does  not  change  from  trial 
to  trial).  The  there  arises  the  matrix  ||p^j||  of  the  so-called  transi¬ 
tion  possibilities.  Any  element  p^j  of  this  matrix  is  the  probability 
of  the  Jth  outcome  in  each  successive  trial  under  the  condition  that 
the  outcome  of  the  trial  directly  preceding  it  was  _i. 

It  is  easy  to  see  that  such  treatment  is  completely  equivalent  to 
the  automaton  treatment:  the  trial  outcome  is,  essentially,  simply  an¬ 
other  name  for  the  state  of  the  automaton  (having  a  single  input  sig¬ 
nal)  with  random  transitions.  There  is,  it  is  true,  one  difference:  In 
the  automaton  with  random  transitions  there  was  fixed  a  completely  de¬ 
termined  initial  state,  in  the  Markov  chains  it  is  customary  to  spec¬ 
ify  the  probabilities  of  the  various  outcomes  of  the  initial  trial 
p^,  P2>  •••»  P nt  which  corresponds  to  the  random  selection  of  the  ini¬ 
tial  state  of  the  automaton,  so  that  the  ith  state  can  be  selected  as 
the  initial  state  with  the  probability  pi  (i  =  1,  2,  ...,  n). 

Thus,  for  a  more  complete  analogy  with  the  Markov  chains  it  Is 
necessary  to  consider  not  the  simple  automata  with  random  transitions 
(having  a  single  input  signal)  but  the  so-called  random  automata  in 
which  not  only  the  transition  function  but  also  the  selection  of  the 
initial  state  is  random,  and  if,  in  addition,  the  output  signals  are 
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taken  into  consideration,  then  the  output  function  must,  generally- 
speaking,  be  random.  In  other  words,  the  output  function  must  specify 
not  simply  the  output  signal,  but  some  probability  distribution  on  the 
set  of  all  possible  output  signals. 

The  Markov  chains  (or,  correspondingly,  the  random  automata)  are 
teimed  finite  or  Infinite  depending  on  whether  the  possible  set  of  out¬ 
comes  (or,  correspondingly,  the  set  of  states  of  the  automaton)  is  fi¬ 
nite.  We  shall  require  for  the  random  finite  automata  of  general  form 
also  finiteness  of  the  set  of  their  input  and  output  signals.  We  shall 
limit  ourselves  to  the  study  of  only  the  uniform  Markov  chains,  i.e., 
those  chains  in  which  the  matrix  of  the  probability  transition  proba¬ 
bilities  is  constant.  We  will  not  encounter  nonuniform  Markov  chains 
(with  matrix  of  the  transition  probabilities  which  depends  on  time)  in 
the  future.  Therefore  for  brevity  we  shall  speak  only  of  Markov  chains, 
meaning  each  time,  if  not  otherwise  stipulated,  that  we  mean  uniform 
chains. 

Let  us  consider  the  automaton  A  with  random  transitions  (Markov 
chain)  whose  transition  probability  matrix  Is  P  =  ||  As  mentioned 

above,  the  arbitrary  element  p^  of  this  matrix  Is  the  probability  of 
the  transition  of  the  automaton  A  from  the  ith  state  into  the  Jth.  It 
is  important  to  emphasize  that  here  we  are  speaking  of  the  transition 
In  one  cycle  (i.e.,  the  interval  between  two  neighboring  moments  of 
discrete  automaton  time).  It  is  easy  to  see  that  the  product 
is  the  probability  of  the  transition  of  the  automaton  A  from  the  1th 
state  into  the  jth  in  two  cycles  with  the  condition  that  the  automaton 
passes  through  the  kth  state. 


The  sum  Zp^k  p^.,  extended  over  all  the  states,  obviously  gives 
k  * 

the  total  probability  of  the  transition  of  the  automaton  A  from  the 


1th  state  into  the  jth  in  two  cycles.  Moreover  this  sum  is  clearly  the 
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element  of  the  matrix  P*P  =  P  standing  at  the  intersection  of  the  ith 

row  and  the  J^th  column.  We  obtain  similarly:  the  probability  of  the 

transition  of  the  automaton  A  from  the  ith  state  into  the  Jth  after 

three  cycles  of  operation  is  equal  to  the  (i,  j)th  element  of  the  ma- 
2  9 

trix  P  *P  =  P  .  Continuing  similarly,  we  come  to  the  following  proposi¬ 
tion. 

Theorem  1.  For  there  random  automaton  A  with  a  single  input  sig¬ 
nal  (uniform  Markov  chain)  whose  transition  probability  matrix  is  P, 
the  probability  of  transition  from  the  ith  state  into  the  jth  after 
exactly  n  cycles  is  equal  to  the  element  of  the  matrix  Pn  standing  at 
the  intersection  of  the  ith  row  and  the  Jth  column  (n  =  1,  2,  3,  ...). 

It  is  natural  to  term  the  elements  of  the  matrix  Pn  the  transi¬ 
tion  probabilities  after  n  steps.  For  the  determination  of  these  prob¬ 
abilities  in  the  case  of  the  finite  Markov  chains  we  make  use  usually 
of  the  so-called  Perron  equation  which  is  derived  in  matrix  theory. 

Let  us  first  recall  certain  definitions  and  concepts  of  this  theory. 

Let  there  be  given  the  matrix  P  =  llP^ j  II  of  nth  order.  The  de¬ 
terminant 


X  —  Pll 

-Pi* 

•  •  •  Pin 

P(X)  = 

—  Pji 

•  •  •  •  • 

X  —  Pn 
•  •  •  • 

...  —  Pjn 

-Pm  — 

Pn* 

.  •  .  X  Pnn 

is  a  polynomial  of  nth  degree  in  X,  termed  the  characteristic  polyno¬ 
mial  of  the  matrix  P.  The  roots  of  this  polynomial  are  termed  the 
eigenvalues  of  the  matrix  P. 

Let  us  denote  by  E  the  unit  matrix  of  nth  order.  Then  the  element 
of  the  matrix  XE  -  P)”1,  located  at  the  intersection  of  the  1th  rovv 
and  the  Jth  column  will  be  equal  to  1/P(X)  .P^X)  where  P ^ ^ ( X )  is  the 
algebraic  complement  of  the  element  of  the  determinant  P(X)  located  at 
the  intersection  of  its  jth  row  and  ith  column. 


Now  let  the  matrix  P  =  Hp^jII  have  the  eigenvalues  Xp  Xg,  Xr. 

We  denote  by  m,  the  multiplicity  of  the  ith  number  Xi#  i.e.,  In  other 
words,  the  maximal  number  _s  such  that  the  characteristic  polynomial 
P (X )  is  divided  by  (X  -  X1)s,and  we  define  the  polynomial  ^(X)  by  the 
equation 


*'(X)  (X-X,r'‘ 

Then  the  element  pj^  of  the  matrix  P^) 


,  located  at  the  intersection 
of  the  ith  row  and  the  jth  column  can  be  determined  from  the  equation 


_  v _ ! _ \x-pJi  <»■>  1 1  , 

P"  L(m,-  Dl  °k  [  *.<« 

V— I 


(71) 


m. 


Equation  (71)  is  the  Perron  equation.  In  it  D^v-1  denotes  the 
derivative  with  respect  to  X  of  order  mv-l.  Substitution  of  the  value  X 

*  Xv  must  be  performed  after  the  differentiation.  The  derivation  of  the 
Perron  equation  can  be  found  in  any  monograph  on  the  theory  of  finite 
Markov  chains  (see,  for  example,  Romanovskiy  [68]). 


The  Perron  equation  takes  a  particularly  simple  form  in  the  case 
when  all  the  eigenvalues  of  the  matrix  P  have  a  multiplicity  equal  to 
unity,  i.e.,  when  m^  =  Mg  =  ...  ■  m  «  1.  It  is  clear  that  in  this  case 
r  =  n.  Since  the  factorial  of  zero  is  unity,  and  the  derivative  of  zero 
order  denotes  the  absence  of  any  differentiation,  then  for  this  par¬ 
ticular  case  we  obtain  the  simple  equation 


pif  -  £ 

V«1 


W H 

*v<*v> 


<M 


n). 


(72) 


We  shall  term  equation  (71)  the  general,  and  equation  (72)  the 
special  Perron  equation.  Equations  (71)  and  (72)  permit  the  solution 
of  one  very  important  problem  of  the  theory  of  finite  Markov  chains  — 
the  problem  of  finding  the  so-called  limit  distribution.  If  there  ex¬ 


ists  the  limit  limP*  =  /**,  then  it  is  natural  to  term  the  elements  of 
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the  matrix  Pw  =  ((p^ j°°||  the  limit  transition  probabilities.  Having  the 
initial  distribution,  i.e.,  the  probabilities  p1#  p2>  •••>  Pn  of  the 
various  outcomes  of  the  initial  trial,  we  can  obtain  the  probabilities 
p"  of  the  various  states  in  the  limit  distribution  from  the  equations 


'  pr = £  p,pt<  2 . n) 

/*-! 


(73) 


It  appears  natural  to  assume  that  after  a  sufficiently  large  num¬ 
ber  of  transitions  of  the  random  automaton  characterizing  the  Markov 
chain,  the  effect  of  the  initial  distribution  of  the  state  probabil¬ 
ities  on  the  distribution  of  the  states  obtained  as  the  result  of  these 
transitions  can  be  made  arbitrarily  small.  In  other  word3,  the  limit 
distribution  obtained  using  equation  (73)  must  not  deoend  on  the  ini¬ 
tial  distribution  (p1#  p2,  pn).  If  the  limit  distribution  has  this 

property,  then  the  corresponding  Markov  chain  is  termed  ergodic.  The 
ergodicity  property  will  obviously  hold  if  and  only  if  for  any  given  i 
all  the  elements  p“±  (j  =  1,  2,  n)  are  identical,  i.e.,  in  other 

words,  when  all  the  rows  of  the  matrix  of  the  limit  transition  prob¬ 
abilities  are  identical. 

It  can  be  shown  that  the  moduli  of  the  eigenvalues  of  the  stochas¬ 


tic  matrices  cannot  exceed  unity.  It  is  also  not  difficult  to  see  that 
the  eigenvalue  for  any  stochastic  matrix  is  unity.  If  all  the  remain¬ 
ing  (non— unity)  eigenvalues  of  the  stochastic  matrix  M  are  strictly 
less  than  unity  in  modulus,  then  the  matrix  M  and  the  corresponding 
Markov  chain  are  termed  proper.  If  in  the  proper  stochastic  matrix  P 
unity  is  a  simple  root  of  the  characteristic  polynomial,  then  the  ma¬ 
trix  P  and  the  corresponding  Markov  chain  are  termed  regular. 

The  following  proposition  is  valid  [14]. 

Theorem  2.  The  Markov  chain  C  with  a  finite  number  of  states  has 
a  limit  distribution  if  and  only  if  it  is  proper.  In  order  that  the 
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chain  C  satisfy  the  property  of  ergodicity  it  is  necessary  and  suffi¬ 
cient  that  it  be  a  regular  chain. 

For  the  case  of  the  regular  finite  Markov  chain  the  limit  transl 
tion  probabilities  are  determined  by  the  equations 


•«?-■£&  2 . * 


(74) 


These  equations  are  obtained  from  the  special  Perron  equation  (72) 
as  the  result  of  the  limit  transition  as  k  -*■  ». 

The  results  described  above  permit  constructing  the  theory  of  the 
behavior  of  automata  (random  and  deterministic)  in  random  media.  We 
shall  limit  ourselves  to  the  consideration  of  only  the  Moore  automata, 
since  in  the  case  of  the  Mealy  automata  there  arises  the  necessity  for 
certain  complications  of  the  theory  which  make  it  less  easily  visual¬ 
ized.  We  also  agree  to  consider  the  deterministic  automata  as  a  par¬ 
ticular  case  of  the  random  automata,  which,  as  mentioned  above,  is  al¬ 
ways  possible. 

With  these  assumptions  every  automaton  A  can  be  specified  by  the 


ij 


matrix  L  of  the  output  probabilities  and  by  the  family  of  ma¬ 
trices  =  ||  of  the  transition  probabilities.  Any  element  X 

of  the  first  matrix  is  equal  to  the  probability  of  the  appearance  of 
the  Jth  output  signal  in  the  case  when  the  automaton  A  is  in  the  ith 
state.  The  quantity  is  the  probability  of  the  transition  of  the 

automaton  from  the  ith  state  into  the  kth  under  the  influence  of  the 
mth  input  signal. 

The  medium  is  specified  for  some  class  of  automata  having  iden¬ 
tical  sets  of  input  signals  (x^,  xg,  ...,  xn)  and  identical  sets  of 
output  signals  (v1,  v2#  . vB).  Specification  of  the  medium  for  the 
considered  class  K  means  the  specification  of  the  dependence  of  the  in¬ 
put  signal  x(t)  of  any  automaton  A  from  the  class  K  at  the  arbitrary 
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moment  of  discrete  automaton  time  t  on  the  value  of  v(t  -  l)  of  its 
output  signal  at  the  moment  of  time  directly  preceding  the  considered 
moment  of  time  t.  It  is  assumed  that  this  dependence  is  the  same  for 
all  the  automata  from  the  given  K.  In  other  words,  the  behavior  of  the 
medium  is  determined  only  by  the  operations  (output  signals)  of  the  au¬ 
tomata  and  does  not  depend  directly  on  the  internal  arrangement  of  the 
automata. 

Let  us  consider  the  random  media  in  which  there  are  dcfir  d  the 
so-called  reaction  probabilities  rjm  which  are  combined  inuo  the  (  rec¬ 
tangular)  reaction  probability  matrix  R  =  ||rjm||.  The  value  of  r^,  is 
taken  to  be  equal  to  the  probability  of  the  appearance  of  the  mth  in¬ 
put  signal  at  the  input  of  the  automaton  A  (from  the  class  K)  operat¬ 
ing  in  the  considered  medium  if  in  the  directly  preceding  moment  of 
time  there  was  delivered  by  the  automaton  A  the  jth  output  signal. 

If  the  medium  reaction  probabilities  are  constant,  the  correspond¬ 
ing  medium  Is  termed  a  stationary  random  medium.  In  the  nonstationary 
random  media  the  reaction  probabilities  can  change  with  time.  Just  as 
In  the  case  of  the  automata,  the  deterministic  media  (with  a  rigor¬ 
ously  defined  functional  relationship  x(t)  =  f(v(t  —  l))  can  be  consid¬ 
ered  as  a  particular  case  cf  the  random  media. 

It  is  easy  to  see  that  the  study  of  the  behavior  of  the  Moore  au¬ 
tomata  (both  determinate  and  random)  in  stationary  random  media  reduces 
to  the  study  of  uniform  Markov  chains  whose  states  can  be  identified 
with  the  states  of  the  considered  automata.  Actually,  the  state  a(t) 
of  the  automaton  A  at  any  moment  of  time  uniquely  defines  the  probabil¬ 
ities  of  the  output  signals  v(t)  and,  consequently,  in  view  of  the  de¬ 
finition  of  the  stationary  random  medium,  also  the  probabilities  of 
the  input  signals  of  the  automaton  in  the  directly  following  moment  of 
time  t  +  1.  The  latter  probabilities  uniquely  determine  the  probabil- 
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ities  of  the  transitions  of  the  automaton  A  from  the  state  a(t)  into 
any  of  the  following  states. 

With  the  notations  assumed  above,  the  transition  probabilities 
^lk  °**  corresponding  Markov  chain  are  determined  by  the  equations 


Plk  ™  S  £  ) 

/  « 

We  shall  describe,  following  Tsetlin  [82],  several  very  simple 
problems  on  the  behavior  of  automata  in  random  media.  To  do  this  we 
consider  the  class  K  of  determinate  Moore  automata  having  the  two  in¬ 
put  signals  xQ  a  0  and  x^  «  1  .nd  the  two  output  signals  vQ  «  0  and 
v^  =  1.  We  specify  the  stationary  random  medium  C  by  the  matrix  R  of 
the  reaction  probabilities 


R 


1  -  P.  P* 
1  —  Pi  Pi 


Let  us  tern  the  input  signal  penalty  and  the  input  signal  Xq  — 
no-penalty.  Then  we  can  say  that  with  the  output  of  the  signal  vQ  the 
medium  penalizes  the  automaton  with  the  probability  pQ,  and  with  out¬ 
put  of  the  signal  v1  —  with  the  probability  p-^. 

Let  us  consider  first  the  Moore  automaton  A  with  the  two  states 
a^  =  1  and  a2  =  2,  given  by  the  matrix  of  the  output  probabilities 
L  -  p|  and  the  matrices  of  the  transition  probabilities 
D'0'  —  p||,  Dw- |p|.  In  other  words,  A  is  a  determinate  automaton  which 
delivers  in  the  first  state  the  output  signal  vQ  =  0,  in  the  second 
state  -  the  output  signal  =  1,  retaining  its  state  under  the  influ¬ 
ence  of  the  input  signal  xQ  =  0,  and  changing  to  the  opposite  state 
under  the  influence  of  the  input  signal  x^  =  1. 

In  accordance  with  what  we  have  said  above,  the  functioning  of 
the  automaton  A  in  the  medium  C  is  described  by  the  uniform  Markov 
chain  M  with  the  two  states  a1  =  1  and  a2  =  2.  From  equation  (75)  we 
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easily  find  the  matrix  P  of  tne  transition  probabilities  of  this  chain 

p.ji'-P.  -p.; 

II  —  Pi  1  -  p, 

The  characteristic  polynomial  pm  =  ^-l+Po-Po  I  of  the  matrix 

— Pi  1+Pil 

is  equal  to  \d  —  X  (2  -  pQ  —  p^)  +  1  —  p  -  p^  and  its  eigenvalues  are 
respectively  X^  =  1  and  X2  =  1  —  Pq  —  p^.  If  both  probabilities  the 
modulus  of  the  second  eigen  value  X2  is  less  than  unity  and,  by  theo¬ 
rem  2,  the  chain  M  will  be  ergodic  in  this  case. 

The  polynomial  will  obviously  be  equal  to  X  —  1  +  pQ  +  p  , 

and  after  application  of  equations  (74)  we  easily  find  the  limit  tran¬ 
sition  probabilities  of  the  considered  chain 

„■  *  ,  * 

P"  Px  ^^,(1)  •  p,  +  p,  p,+p, 


and 


1  —  1  +  Po  Pq 
Po  +  Pi  ~  Po+Pl  ' 


Thus,  with  sufficiently  long  functioning  in  the  medium  C  the  au¬ 
tomaton  A,  regardless  of  the  choice  of  the  initial  state  will  with 
the  probability  P^/Pq  +  be  in  the  first  state,  and  with  the  proba¬ 
bility  Pq/Pq  +  p^  —  in  the  second  state.  Since  the  penalty  probability 
in  the  first  state  of  the  automaton  is  equal  to  pQ,  and  in  the  second 
state  is  equal  to  p^,  then  the  mathematical  expectation  S  of  penalty 
of  the  automaton  A  at  each  step  (after  sufficiently  long  preliminary 
functioning)  is  expressed  by  the  equation 


n  £ i  ,  n  _£b_  =?£>£- 

PoPo+Pi  *  lPo+Pi  Po+Pi 


With  Pq  /  p^  the  quantity  S  is  strictly  less  than  the  mean  pen¬ 
alty  probability  SQ  =  1/2  (pQ  +  p^.  Actually, 

c  c  (Po  +  P.f  -  4poPi  _  (Po-P.)*  ^  n 

•  5<p.  +  K)  2  (p,  +  p7) > 


where  equality  is  obviously  achieved  only  when  pQ  =  p-^.  Thus,  the  con- 

-  251  - 


sldered  automaton  A  possesses  purposeful  behavior  in  the  sense  that 
when  it  Is  placed  in  any  stationary  random  medium  which  differentiates 
its  two  possible  reactions  it  tends  to  that  behavior  for  which  the  pen¬ 
alty  value  is  on  the  average  less  than  for  an  automaton  delivering 
with  equal  probabilities  both  of  the  output  signals  (reactions)  which 
are  possible  for  the  automaton  A. 

Let  us  now  select  in  place  of  the  considered  automaton  A  the  au¬ 
tomaton  An  with  2n  states  1,  2,  . ..,  n,  n  +  1,  . ..,  2n  -  1,  2n.  We  as¬ 
sume  that  in  the  states  1,  2,  n  it  delivers  the  output  signal 

vQ  =  0,  and  in  all  the  remaining  states  it  delivers  the  output  signal 
v,  =  1.  Assume,  further,  that  the  transition  table  of  the  automaton 
A^  is  written  as 

\  Q 

^  \  12  3.  .  .  n —  J  n  n-f  1  n+3  .  .  .  2n— 1  2 n 

0  112..  .n— 2  n— 1  n+1  n+1  n+2  .  .  .  2n— 2  2/i-H 

1  2  34  .  .  .  n  2  n  n+2  n+3  n+4  .  .  .  2n  n 

To  this  table  there  corresponds  the  transition  graph  shown  in 

Pig.  12. 

*• o  <3K®  ~ @0 

*•»  ©-•(?)-*■  — •t-©**-® 

Fig.  12 

From  the  forro  of  its  graph  it  is  natural  to  term  it  an  automaton 
with  linear  tactic.  The  automaton  A  analyzed  above  is  obviously  a  par¬ 
ticular  case  of  the  automaton  An  with  linear  tactic,  for  which  the  val¬ 
ue  of  n  is  equal  to  unity.  The  behavior  of  the  automaton  A  with  linear 

—  n 

tactic  in  the  general  case  is  studied  exactly  as  in  the  considered  par¬ 
ticular  case,  although,  of  course,  the  corresponding  operations  are 
considerably  more  complex.  These  analyses  lead  to  the  conclusion  that 
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the  value,  calculated  by  analogy  with  the  preceding  case  of  the  mathe¬ 
matical  expectation  Sn  of  penalty  of  the  automaton  An  after  one  step 
of  Its  operation  (after  a  sufficiently  long  period  of  adaptation)  with 
unlimited  increase  of  the  number  n  tends  toward  a  natural  minimal  value 
^min'  eclual  to  the  smaller  of  the  two  numbers  pQ,  p^.  It  is  easy  to  see 
that  the  quantity  Sm^n  is  the  absolute  minimum  of  the  mathematical  ex¬ 
pectation  of  penalty  for  all  automata  operating  in  the  considered  ran¬ 
dom  medium. 

It  Is  found  that  in  many  cases  the  apparatus  of  the  uniform  Markov 
chains  can  be  used  with  success  for  the  study  of  the  behavior  of  au¬ 
tomata  not  only  in  stationary,  but  also  in  certain  nonstationary  random 
media.  Let  us  assume,  for  example,  that  there  are  several  stationary 
ransom  media  C^,  C2,  ...,  similar  to  the  medium  C  described  above, 
but  having  different  probability  pairs  Pq,  p,.  Prom  these  media  we  can 
construct  the  nonstationary  random  medium  N  by  Introducing  the  matrix 
B  =  llb^jH  of  the  transition  probabilities  of  some  Markov  chain  with  k 
states.  At  any  given  moment  t  of  discrete  time  the  medium  N  acts  like 
one  of  the  media  C^,  C^,  . . . ,  C^..  If  the  model  for  Its  actions  Is  the 
medium  C^,  then  we  say  that  the  medium  N  is  in  the  ith  state.  The  quan¬ 
tity  b^j  is  the  probability  of  the  transition  of  the  medium  N  from  the 
ith  state  into  the  jth  (i,  J  =  1,  2,  ...,  k).  The  probability  b^  is 
assumed  to  be  constant  and  unchanging  In  the  course  of  time. 

If  now  some  automaton  A  functions  In  the  medium  N,  then  the  pairs 
(C^,  a^),  consisting  of  the  state  of  the  medium  N  and  the  state  a^ 
of  the  automaton  A  can  be  selected  as  the  states  of  some  uniform  Markov 
chain.  The  matrix  of  the  transition  probabilities  of  this  chain  can  be 
easily  constructed  from  the  matrices  of  the  reaction  probabilities  of 
the  media  C^,  Cg,  ...,  C^,  the  matrix  B  of  the  switching  function  and 
output  function  of  the  automaton  A. 
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It  can  be  shown  [82]  that  in  the  class  K  of  automata  Afi  with  lin¬ 
ear  tactic  operating  in  the  described  nonstationary  medium  N  there  is 

(depending  on  the  choice  of  the  medium  N)  an  optimal  automaton  A  , 

n0 

having  a  minimal  (in  the  class  K)  mathematical  expectation  of  the  lim¬ 
iting  value  of  the  penalty  at  each  step  of  its  operation.  Thus,  in  con¬ 
trast  with  the  stationary  meida,  for  automata  with  linear  tactic  ope¬ 
rating  in  nonstationary  media,  it  is  advisable  to  increase  the  volume 
of  the  automaton  memory  (number  of  states)  only  to  a  certain  limit, 
after  which  further  Increase  of  the  memory  leads  to  deterioration  ra¬ 
ther  than  improvement  of  the  quality  of  the  operation  of  the  automaton. 
§5.  THE  PROBLEM  OP  PATTERN  RECOGNITION  TRAINING 

One  of  the  most  significant  fields  of  application  of  the  theory 
of  the  self-organizing  systems  is  that  of  the  problem  of  the  recogni¬ 
tion  of  visual  patterns.  The  recognition  of  visual  patterns  and  the 
training  for  such  recognition  is  a  brilliant  example  of  the  adaptive 
properties  of  the  human  brain.  The  meaning  of  pattern  recognition  is 
that  the  human  observer  combines  certain  sets  of  objects  or  phenomena 
which  he  observes  into  a  single  class,  termed  the  pattern.  The  patterns 
with  which  the  human  being  operates  are  not  random  combinations  of  ob¬ 
jects,  but  rather  those  combinations  which  are  related  by  some  common 
properties.  Considering  basically  the  visual  patterns,  we  shall  in  the 
future  term  the  individual  objects  which  compose  this  pattern  Images. 

Examples  of  visual  patterns  might  be  the  set  of  all  the  images  of 
a  particular  letter  or  digit,  the  set  of  the  Images  of  all  possible 
buildings,  etc.  By  analogy  with  the  visual  patterns  we  can  consider 
also  the  sound  patterns  (for  example,  the  set  of  all  the  pronouncia- 
tions  of  a  particular  phonem,  the  set  of  all  waltzes,  etc.)  and  pat¬ 
terns  of  any  other  nature.  In  the  future  we  shall  limit  ovrselves  to 
the  consideration  of  only  the  visual  patterns,  as  examples  however,  all 
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our  theoretical  constructions  will  be  applicable  not  only  for  the  vis¬ 
ual,  but  also  for  any  other  patterns. 

For  the  following  constructions  we  need  first  of  all  the  defini¬ 
tions  of  the  abstract  images  and  patterns.  We  will  assume  that  the  im¬ 
ages  are  perceived  by  some  set  of  sensitive  elements  -  receptors.  By 
analogy  with  the  case  of  the  visual  patterns  perceived  by  the  human 
eye,  we  term  this  set  the  retina.  In  the  case  of  abstract  images  and 
patterns  we  do  not  need  to  be  more  specific  on  the  question  of  the  spa¬ 
tial  arrangement  of  the  receptors.  However,  in  the  case  of  specific 
visual  patterns  this  refinement  is  useful.  In  this  case  we  will  usu¬ 
ally  assume  that  the  receptors  constituting  the  retina  are  arranged  on 
a  plane  at  points  forming  a  regular  grid,  i.e.,  in  other  words,  at  the 
points  with  the  coordinates  (a  +  ic,  b  +  jc),  where  c  /  0,  I  runs 
through  the  set  of  values  0,  1,  ...,  m  —  1  and  J  runs  through  the  set 
of  values  0,  1,  ...,  n  —  1.  In  the  future  we  shall  such  a  retina  a  reg¬ 
ular  rectangular  (n  x  m)-retlna. 

The  task  of  the  retina  is  to  convert  the  image  projected  onto  it 
into  same  ensemble  of  signals  of  a  standard  form  which  are  put  out  by 
the  receptors  composing  retina.  In  the  future  we  shall  differentiate 
two  forms  of  receptors :  the  so-called  continuous  receptors  whose  out¬ 
put  signals  can  be  any  real  numbers  on  some  fixed  segment  of  the  num¬ 
ber  line,  and  the  so-called  discrete  receptors  which  can  deliver  only 
two  different  output  signals.  Without  losing  generality  we  can  fix  as 
the  domain  of  the  values  of  the  output  signals  of  the  continuous  re¬ 
ceptors  the  number  segment  [0,  1],  and  as  the  possible  values  of  tha 
output  signals  of  the  discrete  receptors  we  can  fix  the  ends  of  this 
se®nent,  i.e.,  the  numbers  0  and  1. 

In  the  case  of  the  visual  patterns  we  will  always  assume  that  the 
output  signal  of  the  continuous  receptor  is  equal  to  the  brighteness 
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of  the  Image  point  projected  on  the  given  receptor,  expressed  in  rel¬ 
ative  units:  zero  corresponds  to  absolutely  black  points,  and  unity  to 
absolutely  white  (reflecting  100#  of  the  light  incident  on  them)  points 
of  the  image.  For  the  discrete  receptors  we  establish  some  brightness 
threshold.  The  points  whose  brightness  does  not  exceed  the  value  of 
this  threshold  will  correspond  to  a  zero  output  signal.  More  specifi¬ 
cally,  however,  in  the  case  of  the  discrete  receptors  we  shall  consid¬ 
er  only  two-tone  Images  consisting  either  of  points  of  zero  brightness 
(background)  or  of  points  of  unit  brightness  (the  Image  itBelf).  In  the 
future  we  shall  use  precisely  this  latter  point  of  view. 

Absrtact  image  is  the  term  we  shall  give  to  any  fixed  ensemble  of 
output  signals  of  the  receptors  constituting  the  retina.  If  the  total 
number  of  receptors  in  the  retina  is  N,  then,  in  view  of  the  assump¬ 
tions  made  above,  the  abstract  Image  can  be  naturally  identified  with 
some  point  of  an  N-dimensional  unit  cube.  In  the  case  of  continuous 
receptors  all  the  points  of  this  cube  correspond  to  the  Images,  while 
in  the  case  of  the  discrete  Images  only  the  cube  vertices  correspond 
to  the  Images.  In  connection  with  this,  we  shall  call  the  N-dimensional 
unit  cube  (in  the  case  of  the  continuous  receptors)  or  the  set  of  its 
vertices  (in  the  case  of  the  discrete  receptors)  the  image  space.  In 
the  first  case  this  space  is  continuous,  in  the  second  case  It  is  dis¬ 
crete  (consisting  of  2N  different  points). 

It  Is  natural  to  Introduce  the  following  "metric"  in  the  Image 
space:  the  distance  between  two  points  of  this  space  (i.e.,  between 
two  images)  is  the  square  root  of  the  sum  of  the  squares  of  the  differ¬ 
ences  of  the  corresponding  coordinates  of  these  points 
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The  R-neighborhood  of  any  point  M  of  the  image  space  is  the  ensemble 
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of  all  points  of  this  space  removed  from  the  point  M  by  a  distance  less 
than  or  equal  to  R. 

We  note  that  the  definitions  introduced  are  applicable  for  both 
the  continuous  and  the  discrete  image  spaces.  In  the  second  case  the 
distance  between  any  two  points  is  the  square  root  of  the  total  number 
n  of  the  noncoincidences  of  the  coordinates  of  these  points.  However, 
it  is  more  convenient  for  us  to  consider  that  in  the  case  of  the  dis¬ 
crete  spaces  the  distance  is  this  number  o  itself.  Then  all  the  dis¬ 
tances  will  be  expressed  by  whole  numbers.  After  the  introduction  of 
the  distance  in  the  image  space,  we  can  talk  of  the  closeness  of  par¬ 
ticular  points  to  one  another.  From  the  intuitive  considerations  asso¬ 
ciated  with  the  concept  of  the  pattern,  it  follows  that  the  images  ly¬ 
ing  sufficiently  close  to  a  particular  image  from  some  pattern  must  be¬ 
long  to  this  pattern  itself.  This  circumstance  must  be  somehow  taken 
into  account  in  the  definition  of  the  concept  of  the  abstract  pattern. 

In  the  case  of  the  continuous  image  space,  the  pattern  can  be  only 
that  set  of  points  of  this  space  which  together  with  any  point  M  also 
wholly  contains  some  e-neighborhood  of  the  point  M  (the  magnitude  of 
e  depends  on  the  choice  of  the  point  M).  Sets  having  this  property 
are  temed  open  sets.  Thus,  in  the  case  of  the  continuous  receptors  we 
shall  tern  any  open  set  of  the  image  space  an  abstract  pattern. 

An  example  of  an  open  set  might  be  the  internal  portion  of  a 
sphere  having  the  same  dimension  as  the  considered  (continuous)  image 
space.  It  is  Important  to  once  again  emphasize  that  the  degree  of 
smallness  of  the  changes  which  can  be  introduced  in  an  image  without 
changing  its  membership  In  a  given  pattern  depends  on  the  choice  of  the 
image  itself.  The  permissible  variations  for  the  images  located  closer 
to  the  boundary  of  the  pattern  (in  the  example  in  question  the  surface 
of  the  sphere  serves  as  the  boundary)  are  reduced,  while  they  are  in- 
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creased  for  the  images  sufficiently  removed  from  the  boundary. 


In  the  case  of  the  discrete  image  space  all  its  subsets  will  ob¬ 
viously  be  open  sets  and  can  consequently  be  considered  as  abstract 
patterns.  For  reducing  the  classes  of  patterns  in  the  discrete  space 
we  can  make  use  of  the  concept  of  the  boundary  index  of  the  set.  Let  M 
be  any  subset  of  the  discrete  image  space  R  which  we  introduced,  and  m 
the  number  of  elements  of  this  set.  If  the  set  M  does  not  coincide  with 
the  entire  space  R,  then  among  its  elements  there  will  be  those  for 
which  at  a  distance  from  them  equal  to  unity  there  lie  elements  not  be¬ 
longing  to  the  set  M.  Let  us  tern  such  elements  boundary  elements  and 
denote  by  m^  the  number  of  all  the  boundary  elements  of  the  set  M.  We 
call  the  ratio  m^/m  the  boundary  index  of  the  considered  set  M. 

It  Is  easy  to  see  that  the  smaller  the  boundary  Index  of  a  set  the 
greater  the  degree  to  which  it  resembles  in  its  properties  the  open 
sets  of  the  continuous  spaces:  an  ever  larger  portion  of  the  points  of 
the  set  with  their  1-neighborhoods  are  contained  in  it.  Therefore  It  is 
natural  to  state  the  proposition  which  Braveiman  [11]  has  termed  the 
compactness  hypothesis:  only  those  sets  whose  boundary  Indices  are  suf¬ 
ficiently  small  can  serve  as  patterns  in  the  discrete  image  space. 

With  a  more  detailed  study  it  is  found  that  this  proposition  must 
be  refined  by  means  of  certain  additional  probability-theoretic  con¬ 
structions.  Let  R  be  a  discrete  image  space  In  which  some  subset  has 
been  fixed.  Let,  further,  for  each  element  x^  of  the  space  R  there  be 
given  the  probability  ffx^  of  the  appearance  of  this  element  (of  the 
image)  in  some  series  of  experiments  of  the  type  of  independent  trials; 
let  f  (x^)  be  the  conditional  probability  of  the  appearance  of  the  ele¬ 
ment  x^  with  the  condition  that  it  belongs  to  the  pattern  S.  If  the 
element  x^  belongs  to  the  pattern  S,  then  for  any  natural  number  k 

Sc(xj.)  Is  the  set  of  points  not  contained  in  the  pattern  S  and  re- 
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moved  from  the  element  v  '  r  a  distance  less  than  or  equal  to  k.  The 
quantity  g  (k(  =  2  fs(x .  i*,)  Is  the  probability  or  wrong  assignment  in 

®  */«S 

a  following  trial  of  an  element  of  the  pattern  S  which  does  not  belong 
to  it  as  a  result  of  the  inclusion  in  S  of  all  elements  lying  in  the  k- 
neighborhood  of  the  element  x^^  in  the  preceding  trial  in  which  some 
element  from  the  pattern  S  was  randomly  selected. 

We  tern  the  operation  of  the  inclusion  in  a  particular  pattern  S 
of  all  elements  of  the  k-neighborhood  of  some  element  x  from  this  pat¬ 
tern  the  operation  of  k-extrapolation  with  respect  to  the  element  x* 

The  quantity  found  above  g0 (k)  is  the  probability  of  the  occurrence  of 
an  error  as  the  result  of  the  operation  of  k-extrapolation  with  respect 
to  the  randomly  selected  element  of  the  pattern  S.  A  refinement  of  the 
compactness  hypothesis,  mentioned  above,  consists  in  the  assumption 
that  for  every  discrete  image  space  R  there  exists  such  a  number  N  = 

=  N(R)  that  N  £  1,  and  for  aH  values  of  k  <  N  the  probability  gs(k) 
of  the  occurrence  of  an  error  in  the  result  of  the  k-extrapolation 
does  not  exceed  the  negligibly  small  constant  nonnegative  quantity  e 
for  any  patterns  S.  We  term  this  the  hypothesis  of  the  N-extrapolata- 
billty  of  the  patterns  with  accuracy  to  e. 

The  operation  of  k-extrapolation  can  obviously  also  be  defined  for 
patterns  in  continuous  image  spaces.  Replacing  the  summing  by  integra¬ 
tion,  we  can  obtain  by  analogy  with  the  expression  for  g  (k)  an  expres- 

s 

sion  for  the  probability  of  the  appearance  of  an  error  as  the  result 
of  extrapolation  in  the  continuous  case.  In  exactly  the  same  way  as  for 
the  discrete  spaces,  we  can  formulate  the  hypothesis  of  the  N-extrapo- 
latability  of  the  patterns  in  the  continuous  image  spaces. 

The  pattern  recognition  problem  leads  to  the  need  for  a.  precise 
description  of  the  features  characterizing  this  pattern.  However,  this 
sort  of  description  cannot  be  given  in  all  cases  by  any  means  without 
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overcoming  very  serious  difficulties.  Therefore,  in  practice  we  usually 
follow  the  path  of  constructing  algorithms  which  make  it  possible  to 
accomplish  the  so-called  training  for  pattern  recognition.  The  essence 
of  this  training  is  obtaining  the  approximate  descripton  (or,  as  cus¬ 
tomarily  phrased,  the  approximation)  of  the  pattern  as  the  result  of 
the  showing  of  some  set  (generally  speaking,  not  all.’ )  of  images  com¬ 
posing  this  pattern. 

Based  on  the  hypothesis  on  the  N-extrapolatability  of  the  pat¬ 
terns,  we  can  construct  the  so-called  general  approximation  algorithm 
which  makes  it  possible  to  accomplish  pattern  recognition  training. 

Just  as  every  algorithm  of  the  self- Improving  type,  the  general  approx¬ 
imation  algorithm  A  has  two  operation  periods  —  the  learning  period  and 
the  examination  period. 

In  the  learning  period  various  representations  of  the  patterns 
R^,  R2 1  . ..,  Rjj*  which  are  to  be  recognized  are  applied  to  the  input 
of  the  algorithm  A.  In  this  case  the  corresponding  representations 
(images)  are  chosen  at  random  (most  frequently  by  the  method  of  inde¬ 
pendent  trials)  and  are  accompanied  by  the  indication:  to  which  of  the 
patterns  R^,  R^,  ...,  R^  each  of  the  selected  images  belongs.  All  the 
Images  shown  in  the  learning  period  are  stored  and  are  used  in  the  ex¬ 
amination  regime  for  the  determination  of  whether  the  next  (also  se¬ 
lected  randomly)  image  r  belongs  to  a  particular  one  of  the  pattern 

^1#  "*»  ^n* 

To  do  this  determinations  are  made  of  the  distances  in  the  image 
space  from  the  Image  r,  first  to  the  representations  of  the  pattern  R^ 
selected  in  the  learning  period,  and  then  to  the  representations  of  the 
pattern  Rg,  etc.,  until  the  succeeding  distance  determined  to  some  re¬ 
presentative  of  some  pattern  R^ (i  =  1,  2,  ...,  n)  is  found  to  be  less 
than  or  equal  to  the  extrapolatability  coefficient  N.  In  this  case  the 
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image  r  is  associated  with  the  pattern  R  .  If,  however,  all  the  dis¬ 
tances  are  larger  than  N,  the  image  r  remains  unrecognized,  i.e.,  it 
will  not  be  associated  with  any  of  the  patterns  Rg,  Rn. 

It  is  easy  to  see  that  if  all  the  patterns  R^,  Rg,  . . . ,  are 
extrapolatable  with  an  accuracy  to  e  and  can  be  covered  with  the  aid 
of  a  number  of  N-neighborhoods  (spheres  of  radius  N)  which  is  signifi¬ 
cantly  less  than  1/e,  then  the  described  algorithm  gives  a  good  approx¬ 
imation  of  the  chosen  patterns  (with  small  probability  of  error  on  ex¬ 
amination)  . 

In  practice  the  different  patterns  in  the  image  space  are  not  usu¬ 
ally  in  direct  contact  with  one  another.  If  we  take  as  the  coefficient 
N  the  minimal  distance  between  patterns,  then  in  the  discrete  space 
the  absence  of  contact  of  the  patterns  means  that  N  >  2.  If  we  also  as¬ 
sume  that  the  probability  of  the  appearance  of  images  not  belonging  to 
any  one  of  tne  selected  patterns  R^,  Rg.  . . . ,  Rn,  is  equal  to  zero, 
then  all  these  patterns  are  obviously  (N  -  l)-extrapolatable  with  an 
accuracy  to  £  =  0.  With  these  assumptions  the  general  algorithm  for 
approximation  with  the  aid  of  (N  -  1-neighborhoods  obviously  leads,  as 
a  result  of  a  sufficiently  long  duration  of  the  learning  period,  to 
an  arbitrarily  good  approximation  (with  an  arbitrarily  small  error 
probability). 

This  general  approximation  algorithm  admits  several  further  im¬ 
provements  in  several  different  directions.  First,  in  addition  to  the 
examination  regime  described  above  we  can  introduce  a  second  type  of 
examination  regime.  In  this  case  the  image  appearing  in  the  examina¬ 
tion  regime  relates  to  that  one  of  the  patterns  R^,  Rg,  . . . ,  Rn,  which 
contains  the  representative  (memorized  in  the  learning  period)  located 
closest  of  all  to  the  image  r.  For  definiteness  we  assume  that  if  there 
are  several  such  patterns  preference  is  given  to  that  one  of  them  which 


has  the  smallest  number. 

The  second  Improvement  consists  In  economy  of  the  memory:  If  the 
N-nelghborhood  of  any  Image  S,  appearing  In  the  learning  period ,  Is 
completely  covered  by  the  N-neighborhoods  of  the  other  Images,  also 
shown  In  the  training  period,  then  the  Image  S  can  Immediately  be  elim¬ 
inated  from  the  memory  and  not  used  during  the  examination.  In  practice 
It  Is  advisable  to  make  use  of  this  Improvement  In  a  somewhat  differ¬ 
ent  modification  In  which  the  maximal  number  of  represen'  ations  of 
each  pattern  which  can  be  remembered  In  the  learning  period  Is  limited 
ahead  of  time.  For  each  remembered  representation  account  Is  taken  of 
Its  relative  usefulness.  As  the  criterion  of  the  relative  usefulness  of 
any  Image  r  we  can  use,  for  example,  the  ration  of  the  number  of  cases 
when  this  Image  was  used  for  the  correct  recognition  of  the  following 
Images  to  the  total  number  of  Images  which  appeared  after  memorization 
of  the  Image  r.  Only  those  Images  are  subject  to  memorization  which  can¬ 
not  be  correctly  recognized  with  the  aid  of  the  images  already  avail¬ 
able  in  the  memory.  If,  in  addition,  the  memory  set  aside  for  the  stor¬ 
age  of  the  representations  of  a  particular  pattern  Is  found  to  be  com¬ 
pletely  full,  then  the  representation  being  memorized  forces  out  of 
the  memory  the  representation  of  the  given  pattern  which  has  the  low¬ 
est  relative  usefulness. 

The  third  Improvement  involves,  along  with  the  "natural"  retina 
(onto  which  the  images  being  recognized  are  projected  directly),  the 
use  of  a  new  retina  whose  output  signals  are  suitably  selected  func¬ 
tions  of  the  output  signals  of  the  first  retina.  The  Image  space  in 
which  the  patterns  are  defined  is  constructed  from  the  output  signals 
of  the  second  retina  (incidentally,  it  is  more  natural  to  call  this 
the  feature  space  rather  than  the  Image  space).  With  the  aid  of  a  Ju¬ 
diciously  chosen  transformation  of  the  original  space  we  can  consid- 
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erably  simplify  the  problem  of  pattern  recognition  training.  Commonly 
used.  In  particular,  are  those  transformations  which  generate  features 
which  are  not  dependent  on  parallel  shift  or  variation  of  the  size  of 
the  Images. 

finally,  we  will  indicate  still  another  modification  of  the  gen¬ 
eral  approximation  algorithm  A.  The  regime  in  which  this  algorithm  re¬ 
ceives  during  the  period  of  its  self-improvement  certain  images  with 
an  indication  of  the  pattern  to  which  they  belong  is  termed  the  train¬ 
ing  regime.  In  many  cases,  in  addition  to  this  regime  it  is  advisable 
to  consider  the  fl<S-called  self-training  regime. 

In  the  self-training  regime  planned  for  the  formation  of  n  dif¬ 
ferent  patterns,  there  are  first  given  the  n  randomly  selected  Images 
rl'  r2'  •••*  rn'  each  of  which  is  taken  to  be  the  representation  of 
some  pattern.  The  image  s.^  being  reshown  is  associated  to  that  pattern 
whose  representation  is  located  closest  to  the  image  s^^  and  the  number 
of  representatives  of  this  pattern  is  increased.  In  the  following  step 
the  reappearing  image  s2  is  compared  with  the  augmented  number  of  re- 
presentations  and  is  again  associated  with  that  pattern  whose  presenta¬ 
tion  (any)  13  located  closer  to  s2  than  the  representatives  of  all  the 
other  patterns.  Thereafter  the  storage  proceeds  either  according  to  the 
usual  scheme  or  by  the  scheme  described  above  with  replacement  (with 
limited  memory).  The  recognition  of  the  images  in  the  examination  re¬ 
gime  can  t'i  perfoimed  by  the  two  methods  described  above:  either  by  N- 
extrapolation  of  the  patterns,  or  by  the  method  based  on  the  determi¬ 
nation  of  the  shortest  distance. 

This  algorithm  can  be  used  in  the  case  of  both  the  discrete  and 
continuous  image  spaces.  The  approximation  method  used  in  it  is  unique, 
of  course.  Thus,  rather  than  approximation  by  the  spherical  neighbor¬ 
hoods  we  could  use  neighborhoods  of  any  other  shape  for  the  approxima- 
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tlon.  Successful  results  have  been  obtained  with  approximation  of  the 
patterns  by  regions  bounded  by  hyperplanes  (see,  for  example,  Braveman 
[11]).  Various  modifications  of  the  metric  described  above  are  also 
possible  in  the  Image  space,  which  obviously  leads  to  alteration  of 
the  concept  of  the  spherical  neighborhoods :  neighborhoods  which  are 
spherical  in  one  metric  may  not  be  so  In  another  metric,  and  vice  versa. 

We  shall  describe  several  other,  more  specialized  algorithms  for 
pattern  recognition  training  which  have  been  used  successfully  by  var¬ 
ious  authors.  One  of  the  simplest,  although  not  extremely  effective, 
algorithms  of  this  sort  is  the  so-called  perceptron  of  Rosenblatt  [69]. 
Just  as  every  device  for  the  recognition  of  patterns,  the  perceptron 
contains  a  set  of  receptors  —  the  retina.  In  the  future,  without  spe¬ 
cially  stipulating  this  each  time,  we  shall  consider  only  regular  rec¬ 
tangular  retinas.  Depending  on  the  nature  of  the  receptors,  the  per- 
ceptrons  are  divided  into  the  continuous  percept rons  (with  continuous 
receptors)  and  the  discrete  perceptrons  (with  discrete  binary  recep¬ 
tors). 

In  addition  to  the  receptors,  each  perceptron  contains  two  other 
forms  of  element,  termed  A-elements  and  R-elements. 

The  A-elements  are  simplified  models  of  the  neurons.  In  this  con¬ 
nection  we  shall  hereafter  tern  them  simply  neurons .  In  accordance  with 
the  nature  of  the  receptors  used  in  the  perceptron  we  differentiate 
the  continuous  and  discrete  neurons.  Both  types  of  neurons  have  two 
forms  of  Inputs,  termed  stimulating  and  inhibiting.  Each  neuron  has  a 
finite  number  of  inputs  and  a  single  output;  in  addition,  with  It  there 
is  associated  some  real  number,  termed  the  weight  of  the  given  neuron. 

As  the  domain  of  the  values  of  the  neuron  weights  we  take  the  set  of 
all  real  numbers,  regardless  of  which  neurons  we  are  considering  —  con¬ 
tinuous  of  discrete. 
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In  addition  to  the  weight,  the  number  of  stimulating  and  the  num- 
ber  of  inhibiting  inputs,  the  neuron  is  also  characterized  by  its  func¬ 
tioning  law,  which  determines  the  output  signal  of  the  neuron  as  a 
function  of  its  input  signals  and  weight.  We  must  keep  in  mind  that  the 
inputs  of  all  the  neurons  in  the  perceptron  are  connected  to  the  recep¬ 
tors,  so  that  the  signals  generated  by  the  receptors  serve  as  the  input 
signals  for  the  neurons. 

The  continuous  neurons,  first  considered  by  Rosenblatt  [69],  had 
a  functioning  law  of  the  form  z  «  v(Zx  —  Zy),  where  z  is  the  output 
signal,  v  is  the  neuron  weight,  Zx  is  the  sum  of  the  signals  applied 
to  the  neuron  through  the  stimulating  inputs,  and  Zy  is  the  sum  of  the 
signals  applied  to  the  neuron  through  the  inhibiting  inputs  (x,  y,  v 
and  z  are  arbitrary  numbers). 

The  functioning  law  of  the  discrete  neurons  is  normally  specified 
by  the  indication  of  some  whole  rational  number  £  termed  the  neuron 
triggering  threshold,  or  simply  threshold.  If  the  algebraic  sum  Zx  -  Zy 
of  the  stimulating  and  inhibiting  input  signals  is  less  than  the 
threshold,  then  the  neuron  is  considered  unstimulated  and  delivers  an 
output  signal  equal  to  zero.  When  the  sum  Zx  —  Zy  reaches  and  exceeds 
the  threshold,  the  neuron  is  stimulated  and  delivers  an  output  signal 
equal  to  its  weight  v  (regardless  of  the  magnitude  of  the  amount  by 
which  the  sum  of  the  input  signals  exceeds  the  threshold). 

It  Is  convenient  to  characterize  the  discrete  neurons  with  the 
described  functioning  law  by  ireans  of  three  whole  numbers  (k,  l,  p), 
the  first  being  equal  to  the  number  of  stimulating  inputs,  the  second 
to  the  number  of  inhibiting  inputs,  and  the  third  to  the  threshold  lev¬ 
el.  In  the  following  considerations  the  neuron  weight  will  always  be 
a  variable  quantity  and  therefore  we  shall  not  Introduce  it  into  the 
neuron  characteristic.  Discrete  neurons  of  the  indicated  type,  having 
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the  same  characteristic  three  numbers  (k,  £,  p)  will  be  associated  with 
the  same  type,  regardless  of  possible  differences  of  their  weights. 

Hereafter  It  Is  assumed  that  all  the  neurons  of  any  given  percep- 
tron  designed  for  the  differentiation  of  k  different  patterns,  the.  set 
of  all  the  neurons  Is  partitioned  Into  k  disjoint  groups  (subsets)  lo¬ 
cated  In  one-to-one  correspondence  to  the  pattern  being  distinguished. 
For  brevity  we  shall  tern  the  neurons  belonging  to  the  group  correspond¬ 
ing  to  the  ith  pattern  the  neurons  of  the  ith  pattern  (1*1,  2, 

• « • ,  k) • 

The  Inputs  of  each  neuron  in  the  perceptron  are  connected  to  the 
receptors  of  the  retina.  Here  it  is  assumed  that  the  different  inputs 
of  the  same  neuron  are  connected  to  different  receptors.  The  outputs  of 
the  neurons  are  connected  to  special  summators  termed  R-elements,  with 
the  outputs  of  the  neurons  of  the  same  pattern  connected  to  the  same 
summator,  termed  the  summator  of  this  pattern. 

The  output  signal  of  the  summator  of  any  given  pattern  is  equal  to 
the  sum  of  the  weights  of  all  the  stimulated  neurons  of  this  pattern. 

If  none  of  the  neurons  of  the  pattern  being  considered  is  stimulated, 
then  the  output  signal  of  the  corresponding  summator  is  taken  equal  to 
zero.  The  final  output  signal  of  the  entire  perceptron  Is  considered 
to  be  that  pattern  whose  summator  has  the  highest  output  signal.  In  the 
case  when  the  maximal  value  of  the  output  signal  is  attained  simulta¬ 
neously  by  the  summators  of  several  patterns,  the  output  signal  of  the 
perceptron  is  considered  to  be  undefined. 

Taking  as  the  input  signal  of  the  entire  perceptron  the  image 
being  projected  on  its  retina,  we  obtain  as  the  reaction  of  the  per¬ 
ceptron  to  this  signal  that  pattern  to  which  the  perceptron  relates 
the  given  Image.  It  does  not  follow  at  all,  of  course,  that  the  consid¬ 
ered  perceptron  accomplishes  the  proper  classification  of  the  images 


266  - 


in  accordance  with  an  a  priori  specified  division  of  the  set  of  images 
into  different  patterns.  This  initial  division  is  specified  by  the  ope¬ 
rator.  We  shall  term  it  the  original  (or  a  priori)  classification  of 
the  images  in  contrast  with  the  actual  classification  accomplished  by 
the  chosen  perceptron. 

Therefore  it  is  necessary  also  to  specify  some  process  of  varia¬ 
tion  of  the  perceptron  characteristics  which  permits  approach  of  the 
actual  classification  performed  by  the  perceptron  to  the  original 
classification  as  we  show  the  perceptron  various  images.  This  process 
is  specified  with  the  aid  of  the  indication  of  the  so-called  encourage¬ 
ment  law. 

As  the  basic  encouragement  law  for  the  discrete  perceptrons  we 
shall  choose  the  somewhat  generalized  encouragement  law  in  the  so- 
called  q-sys terns  which  were  considered  by  Joseph  [34].  This  law,  which 
we  shall  term  the  (generalized)  a-law,  is  completely  characterized  by 
the  specification  of  two  nonnegative  constants  a  and  b,  not  simulta¬ 
neously  equal  to  zero.  The  meaning  of  this  encouragement  law  consists 
in  the  weights  of  some  neurons  being  increased  by  an  amount  equal  to  a 
and  the  weights  of  the  others  being  decreased  by  an  amount  equal  to  b 
after  each  showing  of  a  succeeding  Image  to  the  perceptron  (the  en¬ 
couragement  law  in  the  Joseph  q-systems  is  obtained  from  the  general¬ 
ized  q-law  in  the  case  when  a  =  1,  b  =  0). 

We  differentiate  two  regimes  of  functioning  of  the  perceptron  with 
generalized  q-law  encouragement.  The  first  regime,  termed  the  training 
regime,  consists  in  the  encouragement  (increase  of  weight  by  the  amount 
a)  of  all  the  stimulated  neurons  of  that  pattern  to  which  the  image  be¬ 
ing  considered  in  the  given  step  belongs,  and  in  the  penalizing  (re¬ 
duction  of  the  weight  by  the  amount  b)  of  all  the  stimulated  neurons 
of  the  remaining  patterns.  It  is  clear  that  the  correct  pattern  to  which 
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the  given  linage  belongs  must  be  indicated  by  the  human  teacher,  since 
only  he  knows  the  original  a  priori  classification  of  the  Images. 

The  second  regime,  termed  the  s elf- training  regime,  differs  from 
the  training  regime  only  in  that  the  determination  of  the  patteren  to 
which  the  image  being  considered  belongs  is  accomplished  by  the  per- 
ceptron  itself  —  this  pattern  is  taken  to  be  that  pattern  which  actu¬ 
ally  was  delivered  by  the  perceptron  in  response  to  the  showing  of  the 
given  Image.  Of  course,  here  there  is  no  guarantee  that  the  response 
delivered  by  the  perceptron  will  be  correct  (in  the  sense  of  the  orig¬ 
inal  classification  of  the  images).  However,  with  observation  of  cer¬ 
tain  conditions,  in  the  case  of  unlimited  .  ncrease  of  the  number  of 
steps  in  the  self-training  process  the  process  can  sometimes  reproduce 
the  original  classification  of  the  Images. 

In  addition  to  the  (generalized)  a- law  encouragement  in  several 
cases  it  is  advisable  to  consider  two  other  laws,  which  we  shall  term 
respectively  the  (generalized)  g-law  and  the  (generalized)  y-law.  Both 
of  these  laws  retain  the  priciple  of  encouragement  and  penalizing  which 
is  used  in  the  (generalized)  a-law.  In  addition  to  this,  in  the  g- 
law  at  each  step  (in  both  the  training  and  self-training  regimes)  there 
is  a  reduction  of  the  weight  of  all  the  neurons  (both  stimulated)  by 
an  amount  which  is  directly  proportional  to  their  weights,  with  a  pro¬ 
portionality  coefficient  g  which  is  the  same  for  all  the  neurons.  In 
the  7-law  there  Is  performed  an  additional  (to  the  operations  of  the 
a-law)  variation  of  the  weights  of  all  the  neurons  (both  stimulated  and 
unstimulated)  by  the  same  amount,  selected  at  each  step  so  that  the 
sum  of  the  weights  of  all  the  neurons  is  always  equal  to  zero. 

In  the  case  of  the  continuous  neurons  the  generalized  a-law  of 
encouragement  consists  in  that  any  neuron  of  the  correct  (a  priori  or 
from  the  point  of  view  of  the  perceptron)  pattern  Increases  its  weight 
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by  the  value  of  the  product  jq  of  the  constant  a  by  the  combined  input 
signal  of  the  neuron:  q  -  a(Zx  -  Zy).  Similarly,  the  weights  of  the 
neurons  of  all  the  remaining  patterns  are  reduced  by  the  amount  b(Zx  - 
—  Zy)  (individually  for  each  individual  neuron).  The  additions  which 
differentiate  the  0-  and  7-laws  remain  the  same  as  in  the  discrete  case. 

In  the  construction  of  the  theory  of  perceptron  trail  ing  and  self¬ 
training  it  is  frequently  advisable  to  consider  not  the  individual  per- 
ceptrons,  but  certain  classes  of  perceptrons.  A  perceptron  class  is  a 
set  of  perceptrons  which  can  differ  from  one  another  only  in  the  meth¬ 
od  of  connection  of  the  neurons  with  the  receptors  and  the  initial 
weights  of  the  neurons.  All  the  remaining  characteristics  of  the  per¬ 
ceptrons  belonging  to  a  particular  class  are  assumed  to  be  the  same. 
These  characteristics  include  the  form  of  the  receptors  and  neurons, 
the  total  number  of  receptors  and  the  structure  of  the  retina,  the  set 
of  images  and  the  set  of  patterns,  the  original  classification  of  the 
images  (their  distribution  over  the  patterns),  the  number  of  neurons 
of  each  pattern,  and,  finally,  the  encouragement  law. 

The  method  of  connecting  the  neurons  with  the  receptors  and  the 
initial  weights  of  the  neurons  are  considered  random  and  are  charac¬ 
terized  (within  the  limits  of  the  selected  class)  by  certain  distribu¬ 
tion  laws.  In  other  words,  the  class  of  perceptrons  Is  considered  not 
as  an  abstract  set  of  perceptrons,  but  as  a  set  with  specified  proba¬ 
bility  field  which  determines  the  probability  of  the  selection  of  a 
particular  concrete  representation  of  the  class  being  considered.  We 
can  thus  consider  that  the  specification  of  the  class  defines  some 
random  perceptron. 

The  initial  weights  of  the  neurons  are  usually  considered  to  be 
independent  random  quantities  having  the  same  distribution  law.  In  the 
same  way  the  method  of  connection  of  each  neuron  with  the  retina  is  as- 
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Burned  to  be  Independent  of  the  connections  of  the  remaining  neurons. 

To  each  possible  method  of  connection  of  an  individual  neuron  with  the 
retina  there  is  associated  the  probability  of  this  method,  common  for 
all  the  neurons.  Here  the  connection  cf  all  the  neurons  of  the  percep- 
tron  (from  the  perceptron  class  being  considered)  to  the  retina  is 
treated  as  a  series  of  independent  trials,  characterized  by  the  indi¬ 
cated  probabilities. 

Combining  the  probability  characteristic  for  the  method  of  con¬ 
necting  the  neurons  with  the  retina  with  the  distribution  law  for  the 
initial  weights  of  the  neurons,  we  arrive  at  the  desired  distribution 
law  in  the  class  of  perceptrons.  One  of  the  moBt  frequently  encountered 
distribution  laws  is  obtained  when  all  the  initial  weights  are  deter¬ 
mined  and  are  equal  to  the  same  number  (most  frequently  zero),  and  the 
connection  of  all  the  inputs  of  any  given  neuron  is  accomplished  inde¬ 
pendently  of  one  another  on  the  basis  of  a  particular  distribution  law 
(most  frequently  uniform)  specified  directly  on  the  retina. 

In  the  construction  of  the  perceptron  training  theory  we  must  con¬ 
sider  the  so-called  training  sequences  and  the  classes  of  training  se¬ 
quences.  The  training  sequence  is  simply  a  finite  sequence  of  Images, 
shown  to  the  perceptron  one  after  another  in  the  process  of  its  train¬ 
ing  or  self-training.  The  total  number  of  Images  shown  (including  rep¬ 
etitions)  is  termed  the  length  of  the  training  sequence.  A  class  of 
tranlng  sequences  is  the  set  of  all  sequences  of  the  same  length  in 
which  there  is  given  the  distribution  law  which  defines  the  probability 
of  the  selection  of  any  given  sequence  of  the  considered  class. 

Most  frequently  this  distribution  law  is  obtained  with  the  assign¬ 
ment  of  a  definite  value  of  the  probability  the  appearance  of  any 
image  from  the  considered  set  of  Images  at  each  step  of  the  training, 
where  we  usually  consider  the  case  when  these  probabilities  are  iden- 
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tlcal  in  all  the  steps,  i.e.,  when  the  training  sequence  Is  a  series 
of  Independent  experiments  on  the  selection  of  the  Images  with  constant 
probabilities  assigned  to  each  Image.  Hereafter  we  shall  limit  ourselves 
to  this  case. 

The  effectiveness  of  the  training  In  a  given  class  A  of  percep- 
trons  with  the  aid  of  the  given  class  B  of  training  sequences  Is  defined 
aa  the  probability  of  correct  recognition  of  the  next  Image  £  applied 
to  a  perceptron  randomly  selected  from  the  class  A  after  the  prelimi¬ 
nary  application  to  it  of  a  training  sequence  randomly  selected  from 
the  class  B.  We  differentiate  two  foms  of  effectiveness.  The  so-called 
total  effectiveness  of  training  is  obtained  when  the  Image  £  Is  se¬ 
lected  at  random  (with  the  a  priori  fixed  probabilities  of  the  appear¬ 
ance  of  the  various  Images  used  in  the  establishment  of  the  distribu¬ 
tion  law  in  the  class  of  training  sequences).  The  training  effective¬ 
ness  with  respect  to  the  single  image  3  is  obtained  when  the  next  Image 
presented  to  the  perceptron  to  recognize  is  precisely  the  image  £  .  If 
the  probabilities  of  recognition  errors  are  the  same  for  all  the  Images 
then  the  total  training  effectiveness  will  obviously  coincide  with  the 
inidvidual  effectiveness  of  training  with  respect  to  any  image. 

In  the  following  section  we  shall  undertake  the  theoretical  study 
of  the  training  effectiveness  for  discrete  perceptrons  with  a-law  en¬ 
couragement.  For  the  moment,  we  note  that  experiments  have  shown  that 
the  training  effectiveness  In  all  the  types  of  perceptrons  described 
above  is  relatively  low.  Therefore  in  the  algorithms  for  pattern  rec¬ 
ognition  training  which  are  used  in  practice  there  are  normally  intro¬ 
duced  several  additional  Improvements  in  comparison  with  the  perceptron 
scheme.  For  example,  in  the  scheme  of  the  Roberts'  adapt  [67],  on  the 
whole  quite  similar  to  the  scheme  of  the  perceptron  with  a-law  encour¬ 
agement,  a  considerable  improvement  of  training  effectiveness  is 
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achelved  by  preliminary  normalization  of  the  Image  (i.e.,  in  other 
words,  by  automatic  shift  of  the  image  to  the  center  6f  the  retina  and 
its  reduction  to  a  standard  size).  Methods  are  also  used  for  the  pre¬ 
liminary  processing  of  features,  schemes  of  multi-stage  perceptrons, 
etc. 

The  deficiencies  of  the  perceptron  and  the  ways  of  removing  them 
will  be  clearer  after  acquaintance  with  the  following  two  sections  in 
which  we  consider  some  questions  associated  with  its  behavior  in  the 
training  and  self-training  regimes. 

§6.  THEORY  OP  TRAINING  OP  DISCRETE  a-PERECPTRQNS 

In  the  present  section  we  Bhall  note  the  basic  outlines  of  the 
theory  of  perceptron  training.  Here  we  shall  limit  ourselves  to  the 
consideration  of  only  the  discrete  perceptrons  with  generalized  a-law 
encouragement  operating  in  the  training  regime  (and  not  self-training.'), 
without  special  stipulation  of  this  circumstance  in  each  case.  In  this 
case  the  training  theory  is  more  simple  and  transparent,  since  it  is 
possible  to  follow  not  the  functioning  of  each  individual  neuron,  but 
to  limit  ourselves  to  the  consideration  of  only  certain  integral  char¬ 
acteristics. 

In  the  variant  of  the  theory  which  we  assume,  this  Integral  char¬ 
acteristic  is  the  so-called  characteristic  tensor  of  the  perceptron. 

We  immediately  emphasize  that  the  use  of  the  term  "tensor”  in  this  case 
is  not  related  with  any  patterns  of  transformation  of  its  component 
with  variation  of  the  coordinate  system,  but  serves  only  as  the  name 
for  a  certain  integral  table  with  three  inputs.  For  the  description  of 
this  table  we  introduce  a  definite  numeration  of  all  the  images  which 
are  being  presented  to  the  considered  perceptron  by  the  numbers  from 
1  to  m  and  the  numeration  of  all  the  patterns  into  which  these  images 
are  subdivided  by  the  numbers  from  1  to  3.  Then  the  characteristic 
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where 


if 

tensor  of  the  perceptron  will  be  the  ensemble  of  components  T^, 
the  indices  JL  and  J  run  through  the  values  from  1  to  m  and  the  index 
k  runs  through  the  values  from  1  to  3.  T^j  denotes  the  number  of  neu¬ 
rons  of  the  tcth  pattern  which  are  stimulated  by  both  the  ith  and  the 
Jth  images. 

The  characteristic  tensor  of  a  class  of  perceptrons  is  defined 
essentially  the  same  way.  The  only  difference  is  that  its  components 
Tij  ln  this  case  be  not  determinate,  but  random  quantities  whose 

distribution  laws  are  determined  in  an  obvious  fashion  by  the  distri¬ 
bution  law  which  characterizes  the  method  of  connection  of  the  neurons 
to  the  retina. 

These  definitions  imply  the  validity  of  the  relation 


7*~r*  (U=  1.2 . m;  1.2 . q).  (76) 

It  is  also  clear  that  any  "diagonal"  element  of  the  tensor, 
for  example,  is  the  number  of  neurons  of  a  particular  (the  kth  in  the 
present  case)  pattern  which  are  stimulated  under  the  action  of  the  1th 
image.  This  Implies  the  validity  of  the  inequaltiy 


T*\>T *  (l, j  —  1.2 . m;  1,2....  q). 

We  shall  term  a  perceptron  or  class  of  perceptrons  symmetrical  if 
the  components  of  the  characteristic  tensor  do  not  depend  on  the  upper 
index,  i.e.,  if  the  following  relation  is  valid 

<*„*,- 1,2 . <r.  1.2 (78) 

In  this  case  the  upper  index  is  redundant  so  that  it  is  natural 
to  characterize  the  symmetrical  perceptrons  and  the  classes  of  percep¬ 
trons  not  by  the  three-input  (T^j)  but  by  the  two-input  table  (T^) 

where  T. .  =  t}.  =  T?4  =  ...  =  T?.,  which  we  shall  term  the  character- 
ij  ij  1J  ij'  - 

lstlc  matrix  of  the  perceptron  (or  class  of  perceptrons). 


Let  us  introduce  still  another  notation.  For  any  (finite)  se- 

-  273  - 


quence  of  linages  £  we  use  (i)  to  denote  the  output  signal  of  the 
summator  of  the  kth  pattern  wnlch  Is  induced  In  the  considered  per- 
ceptron  by  the  1th  Image  shown  after  training  of  the  perceptron  with 

if 

the  training  sequence  £.  We  use  to  denote  the  corresponding  signal 
prior  to  the  beginning  of  the  training  process,  i.e.,  in  other  words, 
the  signal  U^(i)  for  the  case  when  the  training  sequence  £  is  empty 
(has  a  length  equal  to  zero). 

The  quantities  U^(i)  will  obviously  be  determinate  in  the  case  of 
the  selection  of  a  particular  perceptron  and  random  in  the  case  of  the 
consideration  of  a  class  of  perceptrons.  Sometimes  it  is  advisable  also 
to  consider  the  sequence  i  as  a  random  sequence,  running  through  the 
class  of  training  sequences. 

A  distinctive  feature  of  the  a-law  for  perceptron  training  is  the 
unique  property  of  commutativity  of  the  training  process  expressed  by 
the  following  proposition. 

Theorem  1.  In  the  perceptron  (or  in  a  class  of  perceptrons)  with 

l r  .  . 

a-law  encouragement  the  output  signal  U^(i)  of  the  summator  of  the  kth 
pattern  does  not  change  under  the  action  of  the  ith  image  after  train¬ 
ing  with  any  sequence  £  if  in  the  sequence  £  there  is  performed  an  ar¬ 
bitrary  permutation  of  the  images  composing  it.  This  is  valid  for  any 
image  1  and  any  pattern  k. 

Actually,  the  input  signals  of  the  neurons  of  the  kth  pattern 
which  are  induced  by  the  ith  image  will  obviously  not  be  altered  in 
the  training  process,  so  that  they  remain  the  same  after  showing  of 
the  training  sequence  £  and  any  other  sequence  V.  Thus,  the  variation 
of  the  output  signal  of  the  summator  in  the  training  process  is  due 
only  to  the  change  of  the  weights  of  the  neurons.  As  a  result  of  the 
definition  of  generalized  a-law  encouragement,  the  variations  of  the 
weights  of  the  neurons  with  the  showing  of  any  image  in  the  training 


process  (but,  generally  speaking,  not  in  the  self-training  process)  do 
not  depend  on  the  place  which  the  given  image  occupies  in  the  training 
sequence.  Since  the  overall  increment  of  the  weight  of  any  neuron  in 
the  training  process  is  equal  simply  to  the  sum  of  the  increments  at 
each  step  of  the  process,  the  validity  of  theorem  1  is  thereby  com¬ 
pletely  proved. 

By  the  use  of  theorem  1  we  can  characterize  any  training  sequence 
£  by  the  integral  vector  v  =  (v^,  v2,  . . . ,  v  )  whose  ith  component 
(for  any  i  =  1,  2,  . . . ,  m)  is  equal  to  the  number  of  occurrences  of  the 
ith  image  in  the  sequence  £.  Let  us  call  this  vector  the  characteristic 
vector  of  the  sequence  £.  The  class  of  training  sequences  can  also  be 
specified  with  the  aid  of  the  characteristic  vector.  However,  the  com¬ 
ponents  of  the  vector  in  this  case  will  be,  generally  speaking,  not  de¬ 
terminate,  but  random  quantities. 

For  the  description  of  the  perceptron  training  process  with  (gen¬ 
eralized)  a-law  encouragement  it  is  sufficient  to  specify  the  original 
perceptron  (or  class  of  perceptions )  by  only  its  characteristic  tensor 
(T^j)  and  the  matrix  of  the  initial  signals  of  the  pattern  summators 
(«i)  (i.  J  =  1,  2,  . . . ,  m;  k  =  1,  2,  ...,  m).  The  training  sequence  £ 
(or  class  of  training  sequences)  is  specified  by  its  characteristic 
vector  (v^,  v2,  ...,  vn).  In  the  general  case  all  the  quantities  T^, 
Ui*  vi  random.  However,  most  frequently  we  consider  various 

particular  cases  when  certain  of  the  indicated  quantities  are  deter¬ 
minate.  We  note  that  we  will  usually  include  an  indication  on  the  se- 
leaction  of  the  Image  set  and  the  training  sequences  in  the  definition 
of  the  perceptron. 

In  order  to  avoid  confusion  of  the  images  and  the  patterns,  we 
shall  designate  the  patterns  by  Latin  letters  and  the  images  as  before 
by  their  numbers.  Let  us  consider  the  question  of  the  determination  of 
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p 

the  output  signals  of  the  pattern  summators  It  is  easy  to  see 

that  with  the  use  of  (generalized)  a-law  encouragement  the  quantity 
U^(i)  is  represented  in  the  form  of  the  initial  signal  and  the  in¬ 
crements  of  the  weights  of  all  the  neurons  of  the  Pth  pattern  which 
are  stimulated  by  the  ith  image  all  the  steps  of  the  training  process. 

Characterizing  the  class  of  the  training  sequences  by  the  charac¬ 
teristic  vector  (v^,  Vg,  . ..,  vm),  it  is  not  difficult  to  find  the  ex- 

p 

press ion  for  the  overall  increment  of  the  quantity  U^(|)  obtained  as 
the  result  of  Vj  showings  of  the  Jth  image.  It  follows  from  the  defini¬ 
tion  of  (generalized)  a-law  encouragement  with  the  constants  a  and  b 
that  with  each  showing  of  the  Jth  image  any  neuron  of  the  Pth  pattern 
which  stimulates  this  Image  will  Increase  its  weight  by  the  amount  a 
if  J  e  P,  and  will  reduce  its  weight  by  the  amount  b  If  J  e  P.  The  total 
number  of  neurons  of  the  Pth  pattern  which  participate  in  the  forma- 

tion  of  the  output  signal  U^(/)  and  which  stimulate  the  Jth  image  is 
clearly  equal  to  T^j.  Thus,  the  total  increment  of  the  magnitude  as  the 
result  of  Vj  showings  of  the  Jth  image  Is  expressed  by  the  formula 
aT^jVj,  if  J  e  P,  and  by  the  formula  bT^jVj  if  J  e  P.  Therefore  the 
following  proposition  Is  valid. 

Theorem  2.  Let  there  be  given  the  discrete  perceptron  (or  class 
of  discrete  perceptrons)  with  the  characteristic  tensor  T^j  and  the 
matrix  of  initial  output  signals  of  the  pattern  summators  (i,  J  =  1, 
2,  ...,  m;  P  €  R).  If  in  the  considered  perceptron  (class  of  percep¬ 
trons)  there  operates  the  (generalized)  a-law  encouragement  with  the 
constants  a,  b,  then  after  training  with  the  sequence  (or  class  of  se¬ 
quences)  l  with  the  characteristic  vector  (v^,  vg,  v  )  for  any 

D 

pattern  P  and  any  image  _i  the  output  signal  u£(/)  of  the  summator  of 
the  Pth  pattern  under  the  action  of  the  ith  image  is  expressed  by  the 
equat ion 
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(79) 


l/f  (/)  ”  Uf  +  a  £  rju, — »  £  rS°r 

/•/*  /«P 

For  any  Image  _1  we  shall  use  to  denote  that  pattern  to  which 
the  image  _i  belongs  in  the  original  classification  of  the  images.  Using 
this  notation,  it  is  not  difficult  to  write  out  the  necessary  and  suf¬ 
ficient  condition  for  the  perceptron  to  correctly  classify  the  1th 
image  after  training.  This  condition  will  obviously  be  the  satisfac¬ 
tion  of  the  inequality 

?or  all  p  +  p,.  (80) 

Using  relations  (79)  and  (80)  it  is  not  difficult  to  calculate 
the  perceptron  training  effectiveness  in  any  specific  case.  These  re¬ 
lations  take  a  particularly  simple  form  in  the  case  of  the  symmetrical 
perceptrons.  Actually,  since  in  this  case  T^  ■  ---  T^,  relations 

(79)  and  (80)  can  be  written  in  the  form  of  the  system  of  inequalities 

l/f'  +  nEr^-iE  7>, >  t/f  +  aE 7>,~ 6  Sr,, ty.  (81) 

/•P|  /«P/  /•#»  l*p 

In  inequality  (8l)  terms  of  the  form  bZT^  v y  for  which  j  is  not 
contained  in  either  nor  P,  appear  in  both  the  left  and  right  sides 
and  therefore  cancel  one  another.  After  their  exclusion  we  obtain  the 
simpler  relations  equivalent  to  relation  (8l): 

ifi1  +  (a  +  b)  E  >  Upt  +  (a  +  b)  E  Ttlvt  jyw  ocex  P*Pt.  ( 82 ) 

/•Pi  itp 

Inequalities  (82)  give  the  necessary  and  sufficient  conditions  for 
the  correct  classification  of  the  itt}  image  by  a  symmetric  perceptron 
with  the  characteristic  matrix  HT^H  and  the  initial  signals  of  the 

p 

pattern  summators  U^  after  training  with  the  sequence  having  the  char¬ 
acteristic  vector  (v^,  Vg*  . ..,  vm).  These  inequalities  can  be  simpli¬ 
fied  still  more  for  the  perceptrons  with  symmetrical  initial  condi¬ 
tions,  i.e.,  those  percpetrons  (or  classes  of  perceptrons)  for  which 
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the  conditions 

Uf  —  Uf  for  all  <-1,2,  ...,m  (83) 

are  satisfied  for  all  P  and  Q. 

Using  relations  (83)  and  recalling  that  as  the  result  of  the  de¬ 
finition  of  the  generalized  a-law  a  +  b  ^  0,  we  come  to  the  following 
result. 

Theorem  3.  Let  there  be  given  any  discrete  symmetric  percept rons) 
with  the  characteristic  matrix  ||T^j||  and  with  symmetric  initial  condi¬ 
tions  in  which  there  operates  (generalized)  a-law  encouragement.  Then 
the  necessary  and  sufficient  conditions  for  the  correct  recognition  by 
the  perception  (class  of  perceptrons)  of  any  1th  image  after  training 
with  the  sequence  (class  of  sequences)  with  the  characteristic  vector 
vl*  v2*  **•'  vm^  is  exPressed  by  the  relations 

E  V/>Er</0/  for  all  P/Pr 

fh  if  1 

« 

Corollary :  training  effectiveness  in  symmetrical  discrete  percep¬ 
trons  with  symmetric  initial  conditions  with  performance  in  them  of 
(generalized)  a-law  encouragement  does  not  depend  on  the  selection  of 
the  (nonnegative)  constants  a  and  b  which  characterize  the  law. 

Thus,  in  the  study  of  the  symmetric  discrete  perceptrons  with 
symmetric  initial  conditions  we  can  without  losing  generality  use  con¬ 
ventional  a-law  encouragement  with  the  constants  (l,  0)  rather  than 
the  generalized  a-law  with  the  constants  (a,  b). 

In  specific  calculations  of  training  effectiveness  in  classes  of 
perceptrons  it  is  usually  assumed  that  all  the  neurons  are  connected  to 
the  retina  independently  from  one  another,  and  the  probability  a^  of 
such  a  connection  of  the  neuron  that  it  will  be  stimulated  by  both  the 
1th  and  the  4th  image  is  the  same  for  all  the  neurons  with  any  fixed 
values  of  i  and  J. 


p 

If  we  use  T  to  denote  the  total  number  of  neurons  of  the  Pth  pat- 

P 

tern,  then  the  component  of  the  characteristic  tensor  of  the  class 
of  perceptrons  being  considered  can  be  treated  as  the  number  of  occur- 

P 

rences  of  some  event  having  the  probability  a^j  with  T  independent 
trials.  As  a  result  of  theorem  2  from  §2  of  the  present  chapter,  the 

P  p 

mathematical  expectation  E(T^)  and  the  variance  D(tJ^)  of  the  random 

p 

quantity  are  expressed  by  the  equations 

E(T^  =  T\r  DiT>)  «  T%(  1  -  V 

(U“l. 2 . m;  PtR)-  '  D 

P  P 

With  sufficiently  large  values  of  T  the  quantity  itself  can 
be  considered  normally  distributed.  We  note  also  that  in  the  case  of 

P 

the  symmetrical  perceptrons  the  quantities  T  will  be  equal  to  one  an¬ 
other  for  different  patterns  P.  Therefore  we  shall  denote  them  simply 
by  the  letter  T,  dropping  the  index  P.  We  shall  term  the  matrix  ||a^.|| 

the  basic  probability  matrix  of  the  class  of  perceptrons  being  cons  id- 

1 

ered.  < 

Similarly,  the  class  of  training  sequences  K  which  are  f jrmed  with 
the  aid  of  the  random  selection  of  an  image  at  each  training  step,  re¬ 
gardless  of  the  images  selected  in  the  remaining  steps,  can  be  char¬ 
acterized  by  the  probability  vector  (p^,  02,  •••>  Pm)  of  the  class  be” 
ing  considered.  For  any  i  -  1,  2,  . ..,  m  the  Ith  component  31  of  this 
vector  is  equal  to  the  probability  of  the  selection  of  the  ith  vector 
as  the  image  being  shown  at  any  given  step  of  the  training.  In  this 
case  the  ith  component  v^  of  the  characteristic  vector  of  the  class  K 
is  the  mtober  of  occurrences  of  the  event  having  the  probability  ^ 
with  N  independent  trials,  where  N  is  the  length  of  the  training  se¬ 
quence  of  the  claps  K  (according  to  the  definition  of  the  class  of 
training  sequences,  all  the  sequences  occurring  in  the  class  have  the 
same  length). 
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With  sufficiently  large  values  of  N,  for  any  image  _i  the  quantity 
can  be  considered  normally  distributed  and  its  mathemetical  expecta¬ 
tion  and  variance  are  given  by  the  equations 

E (vt)  =*  A/p,;  D(v()  NQ((\ — P,)  (t ««  1,2 . m).  (86) 

We  note  that  the  randan  quantities  v^,  and  also  the  random  quan¬ 
tities  T^,  are  not,  generally  speaking,  independent  for  various  val¬ 
ues  of  i  and  J,  which  creates  additional  difficulties  in  the  calcula¬ 
tion  of  the  probability  of  correct  operation  of  the  percept ron  using 
equations  (81)  and  (84).  However,  in  many  cases  we  can  avoid  these 
difficulties  by  the  introduction  of  certain  additional  propositions. 

We  shall  demonstrate  this  situation  using  several  examples. 

Example  1.-  We  consider  the  discrete  perceptron  A  with  neurons  of 
the  (1,  1,  1,)  type,  having  a  regular  square  (n  x  n)-retina  and  2n 
images,  which  are  chosen  to  be  n  horizontal  lines  of  length  n  combined 
into  the  pattern  P,  and  n  vertical  lines  of  length  n  combined  into  the 
pattern  Q.  All  the  Images  have  the  same  probability  (equal  to  l/2n)  of 
appearing  in  the  training  sequence.  We  assume  that  the  perceptron  A  is 
complete.  This  means  that  in  both  the  neuron  set  of  the  Pth  pattern  and 
in  the  neuron  set  of  the  Qth  pattern  for  any  method  of  connection  of 

the  neuron  to  the  retina  there  is  precisely  one  neuron  having  exactly 

the  same  connection  with  the  retina.  In  the  perceptron  A  there  operates 
a-law  encouragement  with  the  constants  a  and  b  and  the  initial  weights 
of  the  neurons  are  equal  to  zero. 

We  are  required  to  find  the  training  effectiveness  of  the  percep¬ 
tron  A  in  the  class  of  random  training  sequences  of  length  2N  contain¬ 
ing  precisely  N  showings  of  the  images  of  the  first  image  pattern  and 

N  showings  of  the  images  of  the  second  pattern. 

Solution.  The  perceptron  will  obviously  be  summetrical  and  will 
therefore  be  completely  characterized  by  its  characteristic  matrix 
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||T^j||.  It  is  easy  to  see  that  the  neuron  is  stimulated  by  the  ith 
image  ( vertical  or  horizontal  line)  if  and  only  if  its  stimulating  in¬ 
put  is  connected  to  the  receptor  lying  on  the  corresponding  line  and 
its  inhibiting  input  is  connected  to  the  receptor  lying  away  from  this 
line.  For  any  given  ji  there  are  in  all  n(n  -  n)  =  n  (n  -  1)  differ¬ 
ent  connections  of  this  sort.  In  view  of  the  completeness  of  the  per- 
ceptron  A,  the  following  equation  is  valid  [formula  (87)] 

T„  ■=  n*(n—  1)  (i  -  1.2 . 2n).  (87 ) 

Let  us  assume  that  the  numbers  from  1  to  n  designate  the  horizon¬ 
tal  lines  (images  of  the  pattern  P)  and  the  numbers  from  n  +1  to  2n 
designate  the  vertical  lines  (images  of  the  pattern  Q).  By  analogy  with 
the  way  the  expression  for  was  found,  we  find  two  more  expressions 

r„-0.  (88) 

if  i  and  J  are  Images  of  the  same  pattern; 

r(( -(*-!)•.  (89) 

if  1  and  J  are  Images  of  different  patters. 

Using  E  to  denote  a  unit  matrix  of  nth  order  and  D  to  denote  a 
square  matrix  of  order  n,  all  the  elements  of  which  are  equal  to  unity, 
we  represent  the  characteristic  matrix  M  of  the  perceptron  being  con¬ 
sidered  in  the  forro 


n'(n— I)£  (n—  1)*D| 
(n—  l)*D  /»•(«—  l)£l 


(90) 


Let  (v^,  v2,  ...,  Vgn)  be  the  characteristic  vector  of  the  class 
of  training  sequences  being  considered.  As  a  result  of  the  assumed  con 
dition,  the  components  of  this  vector  satisfy  the  condition 


th  +  u«  +  •  •  •  +  ^  +  Vn+i  +  •  •  •  +  Vin  “  AT. 


(91) 


As  the  result  of  theorem  3,  we  write  the  necessary  and  sufficient 
condition  for  correct  recognition  of  any  given  image  i 

£  ^ uvi  '>  £  ^  uvi 

fP,  liPi 
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(92) 


or,  taking  account  of  relations  (87)- (89)  and  (91 )> 

#{n-\)vt>(n-\rN.  (93) 

We  write  relation  (93)  in  the  equivalent  form 


The  probability  of  the  appearance  of  the  ith  linage  in  each  of  the 
N  showings  of  the  represSntations  of  the  pattern  Is  equal  to  1/n. 
Therefore  for  the  mathematical  expectation  and  the  variance  of  the 
quantity  we  obtain  the  expressions 

D<0- *4  ('-■)•  (95) 

In  view  of  theorem  3  from  $3  of  the  present  chapter,  with  suffi¬ 
ciently  large  N  the  probability  q^  of  satisfaction  of  inequality  (94) 
can  be  calculated  from  the  equation 

*  _  *• 

9,as0.5  +  -^r- Tdt  *  1.2 . 2«),  (96) 

• 

where  k  is  the  value  of  the  ration  of  the  modulus  of  the  difference  of 
the  right  side  of  inequality  (94)  and  the  mathematical  expectation 
E(v^)  to  the  mean  square  deviation  of  the  quantity  v^,  equal  to  the 
square  root  of  the  variance.  In  other  words, 

(97) 

Since  the  value  of  q^  does  not  depend  on  i,  it  coincides  with 
the  probability  _q  of  correct  recognition  by  the  perceptron  A  of  any 
randomly  selected  image  after  the  preliminary  showing  of  the  randomly 
selected  training  sequence  of  length  2N  from  the  class  of  sequences  be¬ 
ing  considered. 

The  value  of  the  probability  is  Just  the  value  of  the  overall 
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effectiveness  of  the  training  of  the  perceptron  A  in  the  given  condi¬ 
tions.  We  present  the  table  of  the  values  of  the  probability  3  for 
several  values  of  k: 

Thus,  in  order  to  reduce  the  probability  of 
error  of  the  perceptron  being  considered  to  0.1# 
with  random  selection  of  the  training  sequence  it 
is  necessary  to  make  use  of  sequences  of  very  great 
length,  equal  to  18  n^.  At  the  same  time,  we  see 
Immediately  from  inequality  (92)  that  we  can  re¬ 
duce  this  probability  to  zero  (obtaining  absolutely  accurate  recogni¬ 
tion)  as  the  result  of  showing  each  image  exactly  one  time,  i.e.,  with 
the  use  of  a  sequence  having  a  length  of  only  2n.  This  example  gives  a 
striking  demonstration  of  the  inadvisability  of  the  use  of  random  train¬ 
ing  sequences.  At  the  same  time  it  indicates  the  serious  differences  of 
the  learning  mechanism  described  from  the  learning  mechanism  realized 
in  the  human  brain. 

Actually,  the  latter  mechanism  has  a  marked  capability  for  ex¬ 
trapolation  of  experience,  i.e.,  for  correct  recognition  of  images 
which  never  appeared  in  the  training  process.  At  the  same  time  the  per¬ 
ceptron  described  in  the  example  considered  does  not  give  a  final  guar¬ 
antee  of  correct  image  recognition  (with  random  organization  of  the 
training  process)  even  when  the  average  number  of  displays  If  each 
image  reaches  a  very  large  number  (of  the  order  of  n^). 

This  conclusion  is  associated,  or  course,  to  a  certain  degree  with 
the  specific  nature  of  the  example.  However,  it  is  not  difficult  to 
note  that  with  purely  random  connection  of  the  (l,  1,  1) -neurons  to 
the  retina  (excluding  the  connection  of  both  inputs  of  the  neuron  to 
the  same  receptro)  the  mathematical  expectation  of  the  components  of 
the  characteristic  matrix  will  differ  from  the  components  of  the  char- 
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acteristic  matrix  of  the  complete  perceptron  only  by  a  constant  factor 
which  is  not  significant  from  the  point  of  view  of  the  calculation  of 
the  training  effectiveness.  Therefore,  with  the  random  connection  of 
the  neurons  to  the  retina  the  most  probable  behavior  of  the  resulting 
perceptrons  will  be  precisely  that  of  the  complete  perceptron  described 
above. 

Thus,  the  random  organization  of  the  connections  of  the  neurons 
with  the  retina  cannot,  generally  speaking,  provide  a  high  quality  of 
perceptron  functioning.  Prom  theorem  3  it  follows  that  the  capability 
of  the  perceptron  for  extrapolation  of  experience  is  increased  with  in¬ 
crease  of  those  components  of  the  characteristic  matrix  whose  indices 
belong  to  the  same  pattern,  and  with  reduction  of  those  components 
whose  indices  belong  to  different  patterns. 

We  shall  say  that  a  perceptron  has  absolute  capability  for  extra- 
pr latlon  if  for  any  pattern  P  and  any  Image  i  from  this  pattern  train¬ 
ing  by  any  sequence  containing  the  image  i  not  less  than  one  time  will 
lead  to  a  correct  recognition  of  all  the  images  of  this  pattern.  We 
obtain  the  following  result. 

Theorem  4.  In  order  that  a  discrete  symmetric  perceptron  with  sym¬ 
metric  initial  conditions  in  which  (generalized)  a-law  encouragement 
operates  ha^e  absolute  capability  for  extrapolation  it  Is  necessary 
and  sufficient  that  all  the  components  T^  of  the  characteristic  matrix 
of  the  perceptron  whose  indices  belong  to  the  same  pattern  be  nonzero, 
and  that  all  the  components  T^  whose  indices  belong  to  dilferent  pat¬ 
terns  be  equal  to  zero. 

Actually,  let  us  assume  that  the  condition  of  the  theorem  Is  sat¬ 
isfied.  Then  inequality  (84)  will  be  valid,  if  for  any  one  of  the  Image 
J  from  P ^  the  value  of  v^  is  nonzero.  As  the  result  of  theorem  3,  this 
means  that  the  perceptron  being  considered  has  absolute  capability  for 


extrapolation. 

Let  us  assume  that  the  condition  of  the  theorem  is  not  satisfied. 
This  leads  to  the  consideration  of  two  cases:  l)  for  some  pattern  Q 
there  is  a  pair  of  images  i,  J  belonging  to  it  such  that  =0;  2) 
there  is  a  pair  of  images  k,  r  belonging  to  different  patterns  and  such 
that  T^r  ^  0.  In  the  first  case,  as  a  result  of  theorem  3  the  learn¬ 
ing  sequence  compose  exclusively  from  the  images  J  does  not  lead  to 
correct  recognition  of  the  image  _i.  In  the  second  case,  let  us  consid¬ 
er  the  learning  sequence  composed  of  one  image  k  and  any  number 
larger  than  of  images  r,  Then,  in  application  to  the  recogni¬ 

tion  of  the  image  k  the  substitution  of  the  indicated  values  in  in¬ 
quality  (84)  leads  to  the  inequality  Tkk  >  T^r  vp.  In  view  of  the  se¬ 
lection  of  vr  this  inequality  is  not  valid,  which  as  a  result  of  the¬ 
orem  3  means  the  impossibility  of  correct  recognition  of  the  image  k. 
Consequently,  in  both  cases  the  perceptron  will  not  have  absolute  capa¬ 
bility  for  extrapolation,  q.e.d. 

Usually  the  images  belonging  to  the  same  pattern  are  numbered  us¬ 
ing  sequential  whole  numbers.  In  this  case  it  is  natural  to  partition 
the  characteristic  matrices  of  the  symmetric  perceptrons  into  cells 
corresponding  to  the  different  patterns.  Absolute  capability  for  ex¬ 
trapolation  is  achieved  in  this  case  when  these  matrices  are  cellulary 
diagonal  and  the  diagonal  cells  do  not  contain  zero  elements.  This 
form  the  cahract eristic  matrices  is  not  always  completely  achievable, 
however  any  good  approximation  to  it  will  require,  as  a  rule,  avoid¬ 
ance  of  the  completely  random  connection  of  the  neurons  with  the  re¬ 
tina.  The  effect  obtained  as  a  result  of  this  deviation  from  random 
connection  is  best  demonstrated  using  an  example. 

Example  2.  Find  the  training  effectiveness  of  the  perceptron  B, 
differing  from  the  perceptron  A  of  example  1  only  in  that  it  retains 


only  those  neurons,  both  ends  of  which  are  connected  to  the  receptors 
lying  either  on  one  horizontal  or  on  one  vertical  line.  The  training 
conditions  are  the  same  as  in  example  1. 


Solution.  The  perceptron  B,  Just  as  the  perceptron  A,  will  obvi¬ 
ously  be  symmeteical.  It  is  not  difficult  to  find  that  the  elements  of 
its  characteristic  matrix  are  given  by  the  relations  T^  *  n(n  -  1); 

T^j  «  0  (i,  J  =  1,  2,  ...,  2n;  i  ^  j).  The  condition  of  correct  recog¬ 
nition  of  the  ith  image  is  expressed  by  the  condition  T^Vj^  >  0  or, 
what  is  the  same,  v^  >  0.  In  other  words,  for  the  correct  recognition 
of  the  1th  image  it  is  necessary  and  sufficient  that  it  was  shown  at 
least  once  to  the  perceptron  in  the  process  of  its  training. 

With  N  randan  displays  of  the  images  of  one  pattern,  the  probabil¬ 
ity  of  the  nonappearance  in  the  training  sequence  of  the  ith  image  is 
obviously  equal  to  and  the  overall  effectiveness  of  the 

training  is  expressed  by  tne  number  1— «-* x.  .  In  order  to  reduce  the 
probability  of  incorrect  operation  of  the  perceptron  to  0.1#,  as  was 
done  in  example  1,  it  is  sufficient  to  set  N  =  7n,  or,  in  other  words, 
to  use  a  training  sequence  of  length  l4n.  We  recall  that  in  the  first 
esample  the  same  training  effectiveness  was  obtained  only  by  using  a 
training  sequence  of  length  l8n^. 

It  is  curious  that  such  a  sharp  increase  of  the  training  effective¬ 
ness  is  obtained  not  as  a  result  of  more  complication,  but  as  a  result 
of  the  simplication  of  the  perceptron,  since  the  perceptron  B  is  ob¬ 
tained  from  perceptron  A  by  discarding  a  large  number  of  neurons  poorly 
connected  to  the  retina.  It  is  easy  to  find  that  the  total  number  of 
neurons  in  the  perceptron  A  is  2n  (n  —  1)  while  that  in  perceptron  B 
is  only  4n(n  —  l).  This  situation  once  again  indicates  the  Imperfection 
of  the  perceptron  learning  mechanism  and  its  significant  difference 
from  the  learning  process  which  take  place  in  the  human  brain. 
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Let  us  consider  another  exampoe  of  the  computation  of  the  learn¬ 
ing  effectiveness  in  a  class  of  perceptrons. 

Example  3.  Determine  the  training  effectiveness  of  the  class  C  of 
discrete  symmetric  perceptrons  with  symmetric  initial  conditions  sub¬ 
ject  to  generalized  a- law  encouragement.  The  retina,  patterns  and  images 
are  the  same  as  in  example  1.  The  number  of  neurons  of  each  of  the  two 
existing  patterns  is  equal  to  N.  The  inputs  of  all  the  neurons  are  con¬ 
nected  independently  of  one  another  with  equal  probability  to  any  re¬ 
ceptor  of  the  retina,  excluding  only  the  case  of  simultaneous  connec¬ 
tion  of  both  inputs  of  a  neuron  to  the  same  receptor.  The  training  se¬ 
quence  contains  each  of  the  2n  images  exactly  once  each. 

Solution.  It  is  easy  to  see  that  the  components  T^  of  the  char¬ 
acteristic  matrix  of  the  class  C,  in  which  the  indices  _i  and  J  are  dif¬ 
ferent  images  of  the  same  pattern,  are  equal  to  zero.  The  condition 
for  correct  recognition  of  the  1th  Image,  given  by  theorem  3,  is  writ¬ 
ten  in  our  case 

T„>ZTir  (98) 

>ZPi 

It  is  easy  to  see  that  the  set  of  neurons  of  the  same  pattern  which 
are  stimulated  by  both  the  ith  and  the  Jth  image  with  different  J,  dif¬ 
fering  from  _i,  are  disjoint.  All  these  sets  are  contained,  of  course, 
in  the  set  M^.  Since  T^  is  Just  the  number  of  elements  of  the  set 
M^,  for  the  satisfaction  of  inequality  (98)  it  is  necessary  and  suf¬ 
ficient  that  among  the  neurons  of  the  pattern  P^  there  be  at  least  one 
neuron  which  is  stimulated  by  the  1th  image  but  is  not  stimulated  by 
any  image  of  the  opposite  (different  from  P1)  pattern. 

From  the  geometry  of  the  images  it  follows  directly  that  this  con¬ 
dition  is  satisfied  by  the  neurons  both  of  whose  inputs  are  connected 
to  the  same  vertical  (if  ^i  is  a  horizontal  line)  or  to  the  same  hori- 
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zontal  (if  1  is  a  vertical  line).  For  any  fixed  i  from  the  total  num¬ 
ber  of  different  connections  this  condition  is  satisfied  only  by 

p  p 

n  (n  — l)  connections.  The  probability  of  the  desired  connection 

p  p 

therefore  equal  to  n(n  —  l)/n  (n  -  1)  =  l/n(n  +  l)  and  the  probability 
that  such  a  connection  will  not  take  place  for  any  of  the  N  neurons  N 
neurons  is  equal  to  ( 1 _ L_  ]  .  Consequently,  the  overall 

L  n(n+l)' _ * 

training  effectiveness  is  expressed  by  the  equation 

If  the  number  of  neurons  of  each  pattern  is  equal  to  7n(n  +  l), 
i.e.,  exceeds  by  approximately  a  factor  of  7  the  total  number  of  re¬ 
ceptors,  then  the  probability  of  incorrect  operation  of  a  perceptron 
randomly  selected  from  the  class  C  after  training  by  display  of  all 
the  Images  one  time  each  will  be  equal  to  e“^,  which  is  equal  to  about 

o.ooi; 

As  mentioned  above,  the  construction  of  the  theory  of  perceptron 
learning  indicates  the  basic  differences  of  the  learning  process  real¬ 
ized  by  it  from  the  actual  learning  process  of  the  human  brain.  Chang¬ 
ing  from  the  discrete  neurons  to  the  continuous,  or  replacing  the  a- 
law  encouragement  by  p-  or  y-law  does  not  significantly  alter  this  sit¬ 
uation.  The  situation  may  be  rectified  partially  by  the  addition  to 
the  processes  realized  in  the  perceptron  of  reconnection  of  the  neurons 
which  interfere  with  or  do  not  significantly  aid  the  learning  process. 

We  can  provide  for,  for  example,  peroidic  verification  of  the 
weights  of  the  neurons  and  random  connections  of  the  neurons  with 
smaller  weight.  Mechanisms  of  this  sort  are  realized  in  Roberts’  adapt 
[67]  and  the  Self  ridge  pandemonium  [72].  They  increase  the  equipment 
utilization  coefficient  and  reduce  the  number  of  neurons,  which  in  the 
perceptron  schemes  with  purely  random  connections  reach  tremendously 
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large  values. 

However,  this  is  completely  inadequate  for  clarification  of  such 
a  feature  of  the  adaptive  functions  of  the  brain  as  the  use  of  partic¬ 
ular  features  distinguished  on  patterns  already  studied  for  the  accele¬ 
ration  of  the  process  of  learning  to  recognize  new  patterns  containing 
all  or  part  of  these  features.  It  is  easy  to  see  that  such  a  process 
can  be  realized  in  the  multi-stage  perceptrons,  i.e.,  in  those  circuits 
in  which  the  pattern  summators  of  the  perceptron  of  the  lower  stage 
are  used  as  the  repectors  for  the  perceptron  of  the  following  stage. 
Here  the  perceptrons  of  the  lower  stages  are  taught  to  recognize  indi¬ 
vidual  properties  of  the  patterns  and  the  perceptrons  of  the  higher 
stages  are  trained  to  recognize  the  ensembles  of  thes  properties.  Cor¬ 
responding  alterations  and  complications  of  the  laws  of  encouragement 
can  be  accomplished  in  many  different  ways.  We  note  that  the  scheme 
which  essentially  includes  the  idea  of  the  two-stage  perceptron  iB  used 
in  the  algorithm  for  teaching  the  rr  ignition  of  geometric  figures  de¬ 
scribed  in  the  work  of  Giushkov,  Kovalevskiy  and  Rybak  [29]. 

Introduction  of  these  improvements  still  does  not  permit  approach¬ 
ing  the  simulation  of  another  important  characteristic  of  the  brain, 
that  is,  the  establishment  of  the  invariance  of  all  the  patterns  with 
respect  to  their  movement  and  to  change  of  dimensions  on  the  basis  of 
a  limited  experience,  using  only  a  small  part  of  all  the  patterns.  To 
achieve  any  success  in  this  direction  we  must  alter  not  only  the  con¬ 
struction  of  the  perceptron  but  also  the  very  methodology  of  the  learn¬ 
ing  process.  To  do  this  we  introduce  the  possibility  of  the  recogniz¬ 
ing  device  itself  participating  in  the  organization  of  the  learning 
sequence. 

If,  for  example,  the  recognizing  device  A  is  shown  as  represent¬ 
atives  of  a  particular  pattern  several  different  images,  then  the  de- 
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vice  A  must  have  the  possibility  of  repeating  the  demonstration  of 
these  Images  as  many  times  as  necessary  to  ensure  their  correct  recog¬ 
nition  in  the  future.  Moreover,  the  device  must  have  the  possibility 
of  repeating  the  display  of  those  same  images  subjected  to  those  vari¬ 
ations  which  the  image  of  an  object  on  the  retina  of  the  eye  is  usually 

r 

subject  to  with  changes  of  the  relative  position  of  the  eye  and  the  ob¬ 
ject  being  considered. 

We  can,  of  course  do  things  other  than  introduce  the  described 
feedback  which  permits  the  recognizing  device  to  alter  the  learning  se¬ 
quence.  In  place  of  this,  the  recognizing  devices  themeselves  can  be 
constructed  so  that  after  the  display  of  a  particular  image  there  is 
an  increase  of  the  probability  of  the  display  at  the  following  step  of 
the  same  image,  viewed,  perhaps,  at  a  different  angle,  or  at  least 
images  belonging  to  the  same  pattern.  In  other  words,  in  the  training 
of  the  recognizing  devices  we  must  avoid  the  construction  of  the  learn¬ 
ing  process  using  the  scheme  of  Independent  trials  and  go  to  the  more 
complex  schemes  described  by  the  Markov  chains. 

The  suggested  variations  of  the  methods  of  construction  of  the 
learning  sequences  considerably  Improve  the  functioning  of  the  recog¬ 
nition  devices  in  the  simple  learning  regime.  However,  It  is  in  the 
self-learning  regime  that  these  variations  are  of  principal  importance, 
since  it  is  only  in  this  direction  that  we  can  hope  that  the  classifi¬ 
cation  of  images  performed  by  the  self-training  devices  will  correspond 
to  the  original  classification  performed  by  a  human.  It  is  clear  that 
the  description  of  processes  of  this  sort  requires  far  more  complex 
mathematical  apparatus  than  that  which  has  been  used  in  the  present 
section. 
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§7.  OPERATION  OP  THE  DISCRETE  a-PERCEPTRONS  IN  THE  SELF-LEARNING  RE¬ 
GIME 

In  the  preceding  section  we  studied  the  behavior  of  the  discrete 
a-perceptrons  in  the  learning  regime.  The  characteristic  feature  of  the 
learning  regime  is  the  presence  of  the  teacher,  who  knows  the  correct 
classification  of  the  Images.  In  the  present  section  we  shall  study- 
some  questions  associated  with  the  behavior  of  the  discrete  a-percep- 
trons  in  the  self-learning  regime.  In  this  case  the  teacher  Is  missing, 
and  the  processes  of  self-organization  which  lead  to  the  alteration  of 
the  Image  classification  performed  by  the  perceptron  are  determined  by 
the  positive  feedback  introduced  into  the  perceptron  circuit. 

It  is  well  known  that  the  analysis  of  the  behavior  of  the  percep- 
trons  in  the  self-learning  regime  which  was  made  by  Rosenblatt  [69]  is 
very  far  from  being  mathematically  rigorous.  The  absence  of  rigoro  sly 
proved  propositions  in  this  field  leads  some  authors  to  ascribe  to  per¬ 
ceptron  self-learning  (particularly  in  publications  of  a  popular  sci¬ 
ence  nature)  many  properties  which  in  actuality  it  does  not  possess 
and  cannot  possess.  On  the  basis  of  the  considerations  of  the  present 
section,  it  is  not  difficult  to  draw  several  conclusions  which  outline 
the  boundaries  of  the  actual  possibilities  inherent  in  the  self-learn¬ 
ing  of  the  perceptrons. 

Let  us  consider  the  discrete  a-perceptron  designed  for  the  recog¬ 
nition  of  the  two  patterns  P  and  Q.  As  the  single  output  signal  of  the 
perceptron  we  shall  consider  the  difference  of  the  signals  of  the  sum- 
mators  of  the  Pth  and  Qth  patterns 

V,(0  =  C/f</)-W  (99) 

Here,  Just  as  in  the  equations  of  the  preceding  section,  the  in¬ 
dex  i  runs  through  all  the  images  of  the  (both  Pth  and  Qth)  patterns, 

£  is  any  sequence  of  Images  shown  to  the  perceptron  in  the  process  of 
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its  self-training. 

Just  as  in  the  case  of  training,  it  is  not  difficult  to  show  that 
the  functioning  of  the  symmetric  perceptron  in  the  self-training  regime 
is  determined  by  the  sum  a  +  b  of  the  encouragement  and  penalty  con¬ 
stants,  and  not  by  these  constants  considered  separately.  Having  in 
view  also  the  possibility  of  arbitrarily  varying  the  scales,  it  is  per^ 
missible,  without  losing  generality,  to  assume  that  a  =  1,  and  b  =  0. 

In  the  future  we  shall  always  make  this  assumption. 

Using  ||T^j||  to  denote  the  characteristic  matrix  of  the  perceptron 
and  recalling  the  definition  of  a-law  encouragement,  we  easily  obtain 
the  equation 

V,  ( II)  -  V,  (/)  +  T„  sign  V,  (/).  ( 100 ) 

Here  the  symbols  £J  denote  the  image  sequence  £  to  which  there  is 

appended  the  image  J. 

Equation  (100)  is  valid  for  any  pair  of  images  i,  J  and  for  any 
image  sequence  £,  The  function  sign  x,  as  usual,  is  taken  equal  to 
plus  1  for  positive  values  of  x  and  equal  to  minus  1  for  negative  val¬ 
ues  of  x.  It  Is  clear  that  in  the  case  of  a  zero  value  of  the  quantity 
Vj(^),  from  the  exact  meaning  of  the  encouragement  law  (positive  feed¬ 
back)  the  quantity  sign  Vj(i)  in  equation  (100)  must  be  undefined.  In 
order  to  avoid  indefiniteness,  in  the  future  we  shall,  by  definition, 
consider  zero  to  be  a  positive  quantity,  so  that  sign  0  «  +  1. 

Keeping  in  mind  the  indicated  modification  in  the  definition  of 
the  function  sign  x>  we  shall  consider  equation  (100)  as  a  method  of 
recurrent  specification  of  the  vector  V(i)  =  (V^(i),  V2(i),  ...,  Vm 
(i)),  which  defines  the  output  signals  of  the  perceptron  under  the  a o- 
tlon  of  any  image  J  *=  1,  2,  ...,  m  after  the  application  to  the  per¬ 
ceptron  input  of  the  image  sequence  £.  The  initial  value  of  this  vector 
v(0)  =  (V^o),  V2(0)>  •••>  Vm)  (0))  is  assumed  given.  The  image  is  as- 
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30ciated  by  the  percept ron  with  the  P  pattern  or  the  Q  pattern  in  ac¬ 
cordance  with  whether  or  not  the  corresponding  component  V^(^)  of  the 
vector  being  considered  is  positive  or  negative  (we  recall  that  zero, 
according  to  the  accepted  agreement,  is  considered  to  be  a  positive 
number). 

Since  all  the  quantities  are  integral  (and  also  nonegative) 
numbers,  the  problem  of  the  design  of  the  perceptron  in  the  self¬ 
training  regime  reduces  in  essence  to  the  problem  of  a  random  walk  over 
a  discrete  lattice  in  space  with  the  number  of  measurements  <  rn.  As¬ 
suming  that  the  display  of  the  images  in  the  self-learning  process  is 
performed  following  the  scheme  of  independent  trials,  it  is  easy  to 
note  that  the  probabilities  of  the  transitions  from  any  point  of  such 
a  lattice  are  determined  only  by  the  ensemble  of  signs  of  the  coordi¬ 
nates  of  this  point. 

It  is  not  difficult  to  see  that  the  set  of  signs  of  the  corrdi- 
nates  of  any  point  of  the  lattice  also  defines  the  image  classifica¬ 
tion  performed  by  the  perceptron  which  has  as  the  vector  of  its  output 
signals  the  radius-vector  of  this  point.  From  the  point  of  view  of  per¬ 
ceptron  theory,  of  prime  interest  is  the  limit  distribution  of  the 
signs  of  the  coordinates  of  the  vector  V(i)  with  unlimited  increase  of 
the  length  of  the  training  sequence  £.  The  analysis  made  above  shows 
that  the  required  distribution  is  obtained  from  the  limit  distribution 
for  the  Markov  chain  corresponding  to  the  walk  over  the  discrete  lat¬ 
tice  described  above. 

Since  this  chain  has  an  infinite  number  of  states,  finding  the 
distribution  limit  in  the  general  case  is  quite  complex.  We  can,  how¬ 
ever,  note  several  cases  when  finding  the  limit  distribution  is  easily 
reduced  to  the  study  of  the  Markov  chain  with  a  finite  number  of  states 

Let  us  consider  as  an  example  the  discrete  symmetric  a-perceptron 


A  designed  for  the  recognition  of  2n  images,  the  first  n  of  which  be¬ 
long  to  the  pattern  P,  and  the  last  n  to  the  pattern  Q.  Assume  further 
that  for  the  elements  of  the  characteristic  matrix  of  the  perceptron 
A  the  relation  =  a  >  0,  holds  if  _i  and  J  belong  to  the  same  pat¬ 
tern  and  =  0,  holds  if  1  and  J  belong  to  different  patterns.  In 
view  of  theorem  4  from  §6  of  the  present  chapter,  the  perceptron  being 
considered  has  absolute  capacity  for  extrapolation  and,  consequently, 
behaves  itself  best  in  the  learning  regime  (learns  the  correct  recog¬ 
nition  as  a  result  of  showing  at  least  one  Image  of  each  pattern).  Let 
us  assume  that  the  initial  conditions  will  be  the  conditions  V^O)  = 

=  b  (b  >  0)  for  i  ■  1,  2,  ...,  m  (m  £  n)  and  Vj(0)  =  -b  for  J  =  m  +  1, 
m  +  2,  . . . ,  n,  . . ,  2n. 

From  equation  (100)  it  follows  directly  that  V  (&)  <0  for  any  se- 

J 

quence  £  with  J  =  n  +  1,  n  +  2,  . . . ,  2n.  The  remaining  components  will 
be  expressed  by  the  equations  =  b  +  k  a  for  I  =  1,  2,  ...,  m  and 

by  V±(i)  =  —  b  +  ka  for  i  =  m  +  1,  m  +  2,  ...,  n,  where  k  is  the  dif¬ 
ference  between  the  number  of  appearances  of  the  Images  corresponding 
to  the  positive  components  V^^1)  and  the  number  of  images  correspond¬ 
ing  to  the  negative  components  (£'  is  the  corresponding  subse¬ 

quence  of  the  sequence  i). 

Let  us  assume  that  the  self-training  process  is  accomplished  us¬ 
ing  the  scheme  of  independent  trials  with  different  probabilities  of 
the  appearance  of  all  the  images.  Since  the  display  of  the  images  of 
one  pattern  in  the  case  considered  has  no  effect  on  the  recognition  of 
the  images  of  the  second  pattern,  we  can  without  losing  generality  as¬ 
sume  that  in  the  self-training  process  there  participate  only  the 
images  of  the  pattern  P  (the  images  of  the  pattern  Q  always  correspond 
to  a  negative  output  signal  regardless  of  whether  they  are  included  in 
the  self-training  process  or  not). 
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Let  us  assume  that  b  does  not  contain  a,  and  use  t  to  denote  the 
whole  number  [a/b]  +1.  It  is  not  difficult  to  3ee  that  for  the  study 
of  the  functioning  of  the  perceptron  A  only  those  values  of  the  para¬ 
meter  k  are  of  interset  which  are  included  in  the  closed  interval 
[-t,  t).  Actually,  if  in  the  self-training  process  the  quantity  k 
reaches  the  value  _t  even  one  time,  then  in  the  future,  as  a  result  of 
equation  (100),  it  can  only  Increase,  and  the  perceptron,  beginning 
with  that  moment,  will  deliver  a  positve  output  signal  for  all  Images 
of  the  pattern  (which  corresponds  to  correct  classification).  Similarly, 
if  the  parameter  k  takes  the  value  -t,  the  perception  will  deliver  a 
negative  output  signal  for  all  images  (which  actually  means  the  absence 
of  any  image  classification,  since  all  the  images  are  associated  by  the 
perceptron  to  the  same  pattern). 

Now,  as  is  easily  seen,  the  limit  behavior  of  the  perceptron  A  is 
determined  by  the  Markov  chain  with  2t  +  1  states  k  =  — t,  — t  +  1, 

. . . ,  —  1,  0,  1,  ...,  t-1,  t.  Inview  of  the  assumption  made  on  th^  pro¬ 
babilities  of  the  appearance  of  the  images  in  the  process  of  the  self¬ 
training,  for  any  k  differing  from  t  or  — t  the  probability  of  transi¬ 
tion  into  the  state  k  +  1  is  equal  to  m/n,  and  the  probability  of  tran¬ 
sition  into  the  state  k  —  1  is  equal  to  n  —  m/n.  Prom  the  state  t  (just 
as  from  the  state  — t)  transition  is  possible  only  into  the  same  state, 
since  from  the  point  of  view  of  the  functioning  of  the  perceptron  all 
states  with  k  >  t  (correspondingly  —  with  k  <-t)  do  not  differ  from 
the  state  k  =  t  (correspondingly  -  from  the  state  k=t). 

Introducing  the  notations  p  =  m/n  and  q  =  n  -  m/n,  we  obtain  for 
the  considered  Markov  chain  the  matrix  of  the  transition  probabilities 
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M 


11  0  0  0. 

.,000 

>0j0, 

..000 

C  p  0  q  . 

N  •  •  • 

..000 

1  «  1  * 

•  •  •  • 

. 

0  0  0  c . 

#  •  •  • 

•  •  •  t 

.  .0  ?  0 

0  0  0  0  . 

.  .p  0  Q 

,0  0  0  0. 

..001 

This  matrix  has  unity  as  Its  double  characteristic  root.  The  proba¬ 
bilities  of  the  transition  of  the  chain  into  the  states  t  and  -t  are 
equal  to  the  limit  tranistion  probabilities  p"+1  ^  and  P*+1  2t+l*  For 
the  probability  P*+1  ^  we  obtain  from  the  Perron  equation 

(101) 


d  (^) 


X-l 


It  is  easy  to  see  that  t+1  (X)  for  JL  differing  from  1  and  from 

o  9 

2t  +  1  contains  (X  —  l)  and  therefore  for  all  values  of  1  p*+1  ^  =  0. 
For  i  =  1  Mx  t+1  (X)  =  (X  -  1)  MjfX)  and  for  i  =  2t  +  1  Mgt+l,t+l  = 
=  (X  -  1)  M2(X),  where  M^X)  -  ptQ(X),  M2(X)  =  qtR(X). 

From  equation  (101)  we  easily  obtain  P*+1  t  =  cp  ,  p“+1  2t+1  =  cq  . 

Since  all  the  remaining  limit  transition  probabilities  in  the 
(t  +  l)^h  row  are  equal  to  zero,  from  the  conditions  of  stochasticity 
of  the  mateix  of  the  limit  transition  probabilities  we  find  the  value 
of  c:  c  =  l/p^+q^.  Thereby  we  have  proved  the  following  proposition. 

With  unlimited  continuation  of  the  self-training  process  the  per- 
ceptron  A  described  above  with  the  probability  p^'/pVq*'  establishes 
the  correct  classification  of  the  images  and  with  the  probability 
q^/pVq^  relates  all  the  images  to  the  same  pattern. 

The  considered  example,  as  the  attentive  reader  can  easily  note, 
strictly  speaking  cannot  be  performed  in  a  real  perceptron,  except  for 
the  trivial  cases  m=n,  p  =  1,  q  =  0  and  m  =  0,  p  =  0,  q  =  1.  The 
reason  is  that  with  the  assumptions  made  relative  o  the  characteristic 
matrix  all  the  images  of  the  same  pattern  stimulate  the  same  set  of 
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neurons.  Therefore,  the  output  signals  induced  by  the  images  of  the 
same  pattern  always  must  be  eq  il  to  one  another,  including  at  the  ini¬ 
tial  moment. 

It  is  not  difficult,  however,  to  note  that  by  setting  =  a+6 
for  all  i  =  1,  2,  ...,  2n  (6  >  0),  we  obtain  the  possibility  of  sat¬ 
isfying  the  initial  conditions  introduced  in  the  example.  Moreover,  if 
6  is  significantly  smaller  than  a,  and  _t  is  relatively  large,  then  the 
perceptron  behavior  described  in  the  example  can  serve  as  a  good  ap¬ 
proximation  for  its  real  behavior. 

Let  us  consider  the  complete  discrete  a-perceptron  B  with  (1,  1,  l) 
-neurons,  with  a  square  (n  x  n)-retina,  designed  for  the  recognition 
of  the  two  patterns  P  and  Q.  The  pattern  P  consists  of  n  horizontal 
lines,  and  the  pattern  Q  and  n  vertical  lines.  Each  of  these  lines  con¬ 
stitutes  an  individual  image.  In  the  preceding  section  it  was  noted 
that  the  perceptron  B  c  ;uld  be  considered  as  the  most  characteristic 
representative  of  the  class  of  perceptrons  with  random  connections  of 
the  neurons  with  the  retina.  According  to  theorem  1  and  the  corollary 
following  It  from  the  work  of  Rosenblatt  [69] >  such  perceptrons  con¬ 
structed  using  continuous  neurons  with  self-learning  must  tend  to  a 
state  In  which  all  the  images  are  related  to  the  same  pattern  with  a 
probability  arbitrarily  close  to  unity.  Let  us  show  that  this  state¬ 
ment  Is  not  valid  for  the  perceptron  B. 

It  is  easy  to  see  that  we  can  select  any  initial  conditions  for 
the  perceptron  B.  We  shall  tenn  the  smallest  of  the  numbers  |V  (0)  | 

(i  =  1,  2,  ...,  2n)  the  lower  boundary  of  the  moduli  of  the  Initial 
conditions.  With  the  assumptions  made,  the  following  theorem  is  valid. 

Theorem  1.  For  any  arbitrarily  small  positive  number  e  there  is  a 
number  S  such  that  in  the  case  when  the  lower  boundary  of  the  moduli 
of  the  initial  conditions  exceeds  S  the  perceptron  B  in  the  self-train- 
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ing  regime  (with  equiprobability  of  appearance  of  all  the  linages)  re¬ 
tains  the  intial  classification  of  the  images  with  the  probability 
P  >  1  -  e. 

Proof,  We  use  N  to  denote  the  length  of  the  training  sequence  l 
and  v ^  to  denote  the  number  of  appearances  of  the  ith  image  (i  =  1, 

2,  . ..,  2n)  in  this  sequence.  Let  V^fO)  =  x^^  (i  =  1,  2,  2n); 

is  the  set  of  all  indices  J  (images)  relating  to  the  pattern  opposite 
in  comparison  with  1  and  such  that  the  sign  of  coincides  with  the 
sign  of  x^;  z^  is  the  set  of  all  indices  J  relating  to  the  pattern 
which  is  opposite  to  _i  and  such  that  the  sign  of  Xj  is  opposite  to  the 
sign  of  x^  (as  before,  zero  is  here  considered  to  be  a  positive  num¬ 
ber). 

As  was  shown  in  the  preceding  section,  the  arbitrary  element 

p 

of  the  characteristic  matrix  of  the  perceptoon  B  is  equal  to  n  (n  -  l), 

p 

0  or  (n  —  1)  depending  on  whether  the  indices  1  and  J  coincide  or  do 
not  coincide  but  relate  to  the  same  pattern,  or  do  not  coincide  and 
relate  to  different  patterns.  Using  this  circumstance,  with  the  aid  of 
equation  (100)  we  easily  learn  that  the  original  classification  of  the 
images  is  retained  in  the  self-training  process  if  for  all  N  =  1,  2,  ... 
the  following  inequalities  are  satisfied 

n*(n  —  l)v|  +  (n—  1  >* ( £ vy  —  £  v,)  +|jrj>0  ( i  =  1,2 . 2 n), 

/«*|  /•«/ 

and  even  more  so  if 


n*(n  — —  ^  — Ip  (<  =  1.2 . 2 n), 

.  ..  . . 


(102) 


where  x  is  the  minimal  of  the  numbers  |xjJ(i  =  1,  2,  ...,  2n).  In  turn 
it  is  not  difficult  to  verify  that  inequalities  (102)  are  satisfied  if 
the  inequalities 


fr—k  <_w'(l  fAT(^T))  (i“1,2 . 2n)- 


(103) 
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are  satisfied. 


With  designation  of  the  quantity  l/4n^  by  the  letter  6  it  is  evi¬ 
dent  that  the  inequalities  (103 )  will  be  obviously  satisfied  if  the 


inequalities 


v<  1 

TT~~2K  <b  *  1.2 . 2 n). 


(104) 


are  satisfied. 

As  the  result  of  theorem  4  of  §3  of  the  present  chapter,  there 
exist  the  positive  constants  a  and  b,  not  depending  on  N,  such  that 
the  probability  PN  of  nonsatisfaction  of  at  least  one  of  the  inequal¬ 
ities  (104)  with  any  fixed  value  of  N  has  the  estimate  R(M)= 

Mn-i  '~e 

The  probability  of  nonsatisfaction  of  at  least  one  of  inequalities 


0  —b  N 

-r*  .  With 


Nn~T 


(104)  for  values  of  N  from  M  to  #  does  not  exceed  the  sum  of  the  series 

Sn  «  _  a  -bN 

7  ‘"Z"T  •  t  and  this  sum  is  clearly  less  than  "*< - .  With 

n*-t 

M  -*■  »  the  quantity  R(M)  vanishes.  Let  us  take  M  so  that  R(M)  <  e. 

Now  taking  S  =  2(M  —  l)n2(n  —  l)  we  find  that  with  x  >  S  the  in¬ 
equalities  (103)  are  satisfied  foe  all  values  of  N  =  1,  2,  . ..,  M  —  1. 
As  a  result  of  the  choice  of  M,  for  all  the  remaining  values  of  N  the 
inequalities  (103)  are  satisfied  with  a  probability  greater  than  1  -  e. 
Since  satisfaction  of  the  inequalities  (104)  for  all  values  of  N  from 
1  to  <  co  means  retention  of  the  original  classification  of  the  images, 
the  theorem  is  proved. 

Theorem  1  shows  that  with  sufficiently  large  initial  values  of 
the  output  signals  for  all  the  Images  the  considered  perceptron  actu¬ 
ally  is  practically  devoid  of  capability  not  only  for  self-training, 
but  even  to  simply  change  the  image  classification  initially  specified 
to  it.  From  the  proof  of  the  theorem  it  is  easy  to  see  that  the  remain¬ 
ing  weak  capability  for  self-alteration  has  its  maximal  value  in  the 
case  of  the  correct  initial  classification.  In  other  words,  the  per- 


ceptron  has  the  least  tendency  to  retain  the  correct  method  of  func¬ 
tioning. 

Let  us  consider  0-law  encouragement.  To  do  this  let  us  fix  the  ar¬ 
bitrary  number  0  included  between  zero  and  unity,  and  let  us  consider 
the  arbitrary  symmetric  perceptron  C  with  0-law  encouragement  whose 
characteristic  matrix  is  diagonal,  i.e.,  in  other  words,  has  nonzero 
elements  only  on  the  principal  diagonal.  As  shown  in  the  preceding 
section,  this  property  is  possessed  by  the  summetric  perceptron  with 
(1,  1,  l)-neurons  which  is  designed  for  the  recognition  of  horizontal 
and  vertical  lines  and  in  which  the  Inputs  of  each  neuron  are  connected 
to  the  elements  of  the  retina  located  on  one  horizontal  or  on  one  ver¬ 
tical. 

In  the  case  of  0-law  encouragement  the  basic  recurrent  relation 
for  the  determination  of  the  output  signals  is  written 

V,  VI)  —  (1  —  » (V,  '</)  +  r„sign  V,  (/)).  ( 105 ) 

The  notations  here  are  exactly  the  same  as  in  equation  (100)  and 
this  relation  is  also  valid  for  any  discrete  symmetric  perceptrons.  In 
the  case  of  perceptrons  with  diagonal  characteristic  matrix  both  terns 
in  the  right  side  of  equations  (100)  and  (105)  always  have  the  same 
sign  (the  case  when  the  second  term  is  equal  to  zero  Is  excluded  from 
consideration).  Whence  follows  directly  the  validity  of  the  following 
proposition. 

Theorem  2.  The  discrete  symmetric  perceptron  C  with  diagonal  char¬ 
acteristic  matrix  is  completely  devoid  of  capability  for  self-learn¬ 
ing  (i.e.,  retains  unchanged  any  given  original  image  classification) 
both  in  the  case  of  a-law  encouragement,  and  in  the  case  of  0-law  en¬ 
couragement. 

It  Is  also  easy  to  see  the  validity  of  the  following  proposition. 
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Theorem  3.  No  discrete  symmetric  perceptron  (with  either  a-  or  6- 
law  encouragement)  operating  in  the  self-training  regime  can  alter  the 
original  image  classification  if  this  classification  relates  all  the 
images  to  the  same  pattern. 

Our  results  can  he  considered  as  counterexamples  to  the  results 
of  Rosenblatt  [69],  to  the  degree  that  his  discussions  relate  not  only 
to  the  continuous  but  also  to  the  discrete  neurons.  In  any  case  these 
results  indicate  that  the  asymptotic  behavior  of  the  perceptrons  in  the 
self-training  regime  is  far  more  complex  and  requires  considerably 
more  precise  techniques  for  its  study  in  comparison  with  the  techniques 
of  purely  qualitative  nature  used  by  Rosenblatt  [69]. 

For  a  visual  representation  picture  of  the  peculiarities  of  the 
behavior  of  the  perceptrons  in  the  self-learning  regime  in  comparison 
with  the  learning  regime,  let  us  consider  the  case  when  the  number  of 
Images  Is  equal  to  two  (each  pattern  consists  of  one  single  image). 

This  case  permits  simple  graphical  Interpretation. 

First  we  note  that  in  the  case  of  the  presence  of  two  patterns 
(but  with  an  arbitrary  number  of  images)  the  functioning  of  the  per¬ 
ceptron  In  both  the  learning  and  the  self- learning  regimes  is  conven¬ 
iently  characterized  by  a  vector  with  the  components  V1(i)  (see  above). 
The  basic  recurrence  relation  for  these  components  will  obviously  have 
the  form 

Vl(lj)~Vl(l)±Tir  (106) 

This  relation  is  valid  for  any  pair  of  Images  i,_J  and  for  any 
training  sequence  £,  The  second  term  in  the  right  side  is  taken  with 
the  plus  sign  if  the  image  J  In  the  correct  classification  relates  to 
the  positive  output  signal,  and  with  the  minus  sign  if  the  correspond¬ 
ing  output  signal  must  be  negative. 
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Fig.  13 


Let  us  consider  the  discrete  symmetric 
perceptron  with  a-law  encouragement  whose 
characteristic  matrix  is  the  matrix  *[  , 

where  a  >  b  >  0.  Let  us  assume  that  with  cor¬ 
rect  classification  the  firs  image  must  in¬ 
duce  a  positive  output  signal,  and  the  sec¬ 


ond  image  —  a  negative  output  signal.  Plotting  the  coordinate 

along  the  horizontal  axis,  and  the  coordinate  V2(i)  along  the  vertical 

axis,  we  associate  with  each  vector  (V^i),  V2(i))  some  point  of  the 

plane.  Selecting  one  point  in  each  quadrant, 

>////*/// 

/<»  we  obtain  a  visual  Impression  of  the  action 

r'  of  eclua‘tlon  (106)  (Fig.  13). 

*  .r  In  Fig.  13  the  letter  denotes  the  vec- 

wM  /  tor  (a,  b)  and  the  letter  T2  denotes  the  vec- 

,1.  tor  (b,  a);  the  characteristic  feature  of  the 


^  ror  ^d,  a;;  ine  cnaracreriBtic  reature  or  rne 

training  regime  is  that  the  directions  of  the 
vectors  (defining  the  random  walks  of  the  point  on  the  lattice)  do  not 
depend  on  the  position  of  the  points  on  the  plane.  The  resultant  of 
these  vectors  is  always  directed  in  the  direction  of  that  quadrant  in 
which  the  signs  of  the  coordinates  (output  signals  of  the  perceptron) 
coincide  with  the  correct  classification  (in  the  present  case  this 
quadrant  is  the  hatched  —  fourth  quadrant). 

For  the  case  of  the  self-training  regime  the  interpretation  of  the 
corresponding  equation  (100)  is  shown  in  Fig.  14.  In  contrast  with  the 
previous  case,  the  directions  of  the  vectors  which  define  the  random 
walks  are  different  in  the  different  quadrants.  The  designations  of  the 
vectors  are  the  same  as  in  Fig.  13. 


It  is  not  difficult  to  note  the  qualitative  differences  of  the 


situation  shown  in  Fig.  14  from  the  situation  shown  in  Fig.  13.  First 
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of  all,  the  first  and  third  quadrants  (shown  shaded  In  Fig.  14)  now 
possess  a  trapping  property:  a  point  which  falls  into  one  of  these 
quadrants  in  the  process  of  the  random  walk  can  never  excape  from  it. 

Entrance  into  these  quadrants  means  actually  the  absence  of  any 
classification  (both  images  are  assigned  to  the  same  pattern).  More¬ 
over,  from  the  quadrant  corresponding  to  correct  classification  with 
accuracy  to  the  sign  of  the  output  signal  (third  quadrant)  there  is  al¬ 
ways  a  zero  probability  of  exit  into  the  neighboring  quadrants. 

Considering  the  resulting  situation  in  the  purely  qualitative  as¬ 
pect,  similar  to  the  approach  of  Rosenblatt  [69],  we  would  have  to  come 
to  the  conclusion  that  the  perceptron  which  we  are  studying  tends  as¬ 
ymptotically  to  a  state  in  which  output  signals  of  the  same  sign  (ab¬ 
sence  of  any  classification)  are  generated  for  all  images.  A  more 
thorough  consideration  (repeating  the  analysis  made  in  the  proof  of 
theorem  l)  leads,  however,  to  a  completely  different  conclusion:  just 
as  in  the  case  of  theorem  1,  with  sufficient  removal  of  the  initial 
point  from  the  boundaries  of  the  quadrant  the  probability  of  contin¬ 
uation  of  the  random  walk  without  leaving  this  quadrant  in  all  the  suc¬ 
ceeding  instants  of  time  (clear  up  to  infinity)  can  be  made  arbitrarily 
close  to  unity. 

This  once  again  underscores  the  danger  arising  in  the  case  when 
general  conclusions  on  the  asymptotic  behavior  of  the  perceptrons  are 
based  on  arguments  of  purely  qualitative  character,  without  conf inning 
them  by  exact  computations  and  estimations. 

§8.  LOGICAL  CLASSIFICATION  SYSTEMS  AND  CONDITIONAL  PROBABILITY  MACHINES 

The  systems  for  pattern  recognition  considered  in  the  preceding 
sections  are  devices  for  the  classification  of  certain  subsets  in  the 
image  space.  Directing  ourselves  to  the  visual,  audible  and  other  pat¬ 
terns  which  are  continuous  in  nature,  we  to  a  certain  degree  trans- 
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f erred  the  property  of  continuity  to  the  corresponding  classification 
systems.  Actually,  even  in  the  case  of  clearly  discrete  receptors  the 
hypothesis  of  the  N-extrapolatability  of  the  patterns,  which  presumes 
continuity  of  the  patterns,  permitted  selection  for  classification  only 
of  those  sets  which  were  in  the  corresponding  sense  "well  arranged." 

This  same  implicit  use  of  the  property  of  the  continuity  of  the  patterns 
is  also  present  in  the  perceptrons  (including  the  discrete)  and  also 
in  all  the  other  algorithms  and  devices  for  the  recognition  of  patterns 
mentioned  in  the  preceding  sections. 

The  limitation  of  the  number  of  image  space  subsets  which  are  sub¬ 
ject  to  consideration  and  classification  in  the  case  of  the  visual, 
audible  and  other  patterns  which  are  of  a  continuous  nature  is  of  prime 
importance,  since  without  such  limitation  the  recognition  problem  would 
be  practically  unsolvable  for  these  patterns. 

Actually,  in  the  case  when  the  retina  consists  of  n  binary  recep¬ 
tors  the  image  space  consists  of  l(n)  *  2n  different  images  and  con- 
.  .  on 

tains  Q(n)  =  2  different  subsets  -  possible  discrete  patterns.  With 
n  =  5  the  second  of  these  quantities  already  exceeds  four  billion,  and 
with  a  relatively  small  number  of  receptors  such  as  100  the  first  quan¬ 
tity  is  expressed  by  one  with  thirty  zeros. 

In  view  of  these  discussions  it  becomes  evident  that  the  problem 
of  the  construction  of  devices  which  store  (or  generate)  the  features 
of  all  possible  patterns  for  any  large  values  of  n  is  practically  un¬ 
solvable.  However,  in  the  case  when  the  number  of  (binary)  receptors 
does  not  exceed  ten  or  fifteen  it  is  in  practice  quite  possible  to  con¬ 
struct  a  machine  capable  of  remembering  and  performing  various  opera¬ 
tions  with  all  the  images  (but  not  with  all  the  patterns)  which  can  be 
reproduced  with  the  aid  of  the  corresponding  retina. 

The  machines  which  classify  all  the  possible  images  which  can  be 
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obtained  from  binary  receptors  will  be  termed  logical  classification 
machines.  We  shall  describe  one  of  the  possible  schemes  of  such  ma¬ 
chines  proposed  by  Attlee  [2]. 

For  each  property  of  the  image  the  Attlee  classification  machine 
contains  the  so-called  discriminative  element  which  is  stimulated  un¬ 
der  the  action  of  this  property.  Here  and  hereafter,  by  image  property 
we  shall  mean  the  presence  in  some  set  M  of  receptors  (depending  on 
the  choice  of  the  corresponding  property)  of  a  definite  combination 
of  output  signals  (zeros  and  ones).  Here  the  receptors  which  do  not  ap¬ 
pear  in  the  set  M  can  have  any  output  signals.  The  property  that  the 
receptors  with  the  numbers  i^,  ig,  ...,  i^  have  unity  output  signal 
and  the  receptors  with  the  numbers  jg,  J  have  a  zero  output 

signal  will  be  designated  by  (i1,  ig,  ...,  3^;  Jg,  J). 

From  these  definitions  it  becomes  clear  that  the  specification  of 
a  property  is  equivalent  to  the  assignment  to  the  retina  receptor  out¬ 
put  signal  of  one  of  three  values ;  one,  zero,  indifferent.  If  we  de¬ 
note  the  total  number  of  receptors  composing  the  retina  by  N,  then  it 

is  easy  to  see  that  with  a  total  number  of  images  equal  to  2N,  the 

N 

number  of  their  different  properties  will  be  equal  to  3  •  All  the  prop¬ 
erties  of  any  given  Image  can  be  obtained  with  the  aid  of  replacement 
of  some  number  of  the  signals  (zeros  and  ones)  composing  this  image  by 
the  indifferent  signals.  The  number  of  such  replacements  (and  this 

means  the  number  of  properties  of  each  image)  will  clearly  be  equal  to 
0  1  N  N 

the  sum  Cjg  +  +  . . .  +  =  2  .  Thus,  each  image  causes  the  stimula- 

N 

tion  of  2  elements  of  the  classification  machine. 

Let  us  term  theproperty  of  the  image  to  stimulate  the  1th  recep¬ 
tor  the  ith  elementary  property,  and  any  combination  of  elementary 
properties  we  shall  teim  a  positive  property.  With  N  binary  receptors 
there  are  in  all  only  N  elementary  properties.  It  is  also  clear  that 
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the  specification  of  any  positive  property  is  equivalent  to  the  spec¬ 
ification  of  some  subset  of  receptors  which  induce  unit  output  signals. 
The  total  number  of  positive  properties  is  thus  equal  to  the  number  of 
subsets  of  the  set  of  N  elements,  i.e.,  2^. 

% 

4  Attlee  terms  the  classification  machine  Just  described  which  is 

able  to  differentiate  any  Image  properties,  the  binary  classification 
machine,  in  contrast  with  the  so-called  unitary  classification  machine, 
which  is  capable  of  differentiating  only  positive  properties.  It  is 
easy  to  see  that  for  every  binary  machine  there  exists  its  equivalent 
with  respect  to  the  classification  being  performed)  unitary  machine 
containing  twice  the  number  of  receptors.  We  need  only  add  to  each  re¬ 
ceptor  which  reacts  to  a  particular  elementary  property  another  recep¬ 
tor  which  reacts  to  the  absence  of  this  property.  Although  at  first 
glance  it  appears  that  after  this  we  need  4  discriminative  elements, 
in  actuality  many  of  them  are  redundant  Bince  they  will  never  be  stim¬ 
ulated.  After  removal  of  the  redundant  elements  the  number  of  remain¬ 
ing  elements  will  be  exactly  the  same  as  in  the  case  of  the  binary  ma- 

N 

chine,  i.e.,  3  .  The  simplicity  of  the  reduction  of  the  binary  machines 
to  unitary  machine  makes  it  possible  for  us  to  limit  ourselves  in  the 
future  to  the  consideration  of  only  the  unitary  machines. 

It  is  convenient  to  picture  the  discriminative  elements  of  the 
unitary  classification  machine  with  N  receptors  in  the  foim  of  neurons 
having  from  one  to  N  input  channels  and  capable  of  being  stimulated 
only  in  the  case  of  simultaneous  stimulation  of  all  their  input  chan¬ 
nels.  Each  of  the  neuron  input  channels  is  connected  to  the  output 
channel  of  some  receptor.  Neurons  with  inputs  connected  to  the  recep¬ 
tors  with  the  numbers  i^,  i 2,  ...,  i^,  will  correspond  to  the  positive 
property  (i^,  ig>  . ..,  1^)  and  will  be  stimulated  only  with  the  pre¬ 
sence  of  this  property  in  the  image  being  recognized.  In  order  that 
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the  total  number  of  neurons  be  equal  to  exactly  2  It  is  necessary  to 
assume  the  presence  of  still  another  neuron  without  input  channels 
which  is  stimulated  constantly  regardless  of  the  image  being  recognized. 
This  neuron  corresponds  to  the  property  which  is  the  combination  of  the 
empty  set  of  the  elementary  properties. 

Let  us  consider  the  arbitrary  receptor  JL  and  all  the  positive 
properties  containing  the  elementary  property  _i.  Among  these  properties 
there  is  exactly  one  property  containing  one  elementary  property  (in 

n  I 

the  present  case  this  will  be  the  property  _i  itself),  exactly  = 

=  N  —  1  properties  containing  two  elementary  properties  each  (all  prop- 
erties  of  the  form  (i,  j),  where  j  /  i),  exactly  properties  con¬ 

taining  three  elementary  properties  each,  etc.  In  the  unitary  machine 
one  neuron  corresponds  to  each  of  the  positive  properties.  The  total 
number  of  neurons  connected  to  the  receptor  _i  is  expressed  by  the  sun 
1  +  +  . . .  +  =  2N“1,  which  amounts  to  exactly  half  of 

all  the  neurons  in  the  unitary  machine. 

Making  a  random  selection  of  the  neurons  with  the  condition  that 
all  the  neurons  are  considered  equiprobable,  we  come  to  the  conclusion: 
the  probability  that  the  neuron  thus  selected  will  be  connected  to 
any  given  receptor  _i  is  equal  to  1/2.  Thus,  a  connection  scheme  which 
is  to  a  certain  degree  close  to  the  scheme  of  the  unitary  classifica¬ 
tion  machine  can  be  obtained  as  the  result  of  the  random  connection  of 

* 

the  neurons  to  the  receptor  with  equal  probability  of  connection  or 
nonconnection  of  the  input  channels  of  a  given  neuron  to  a  given  re¬ 
ceptor. 

In  the  described  scheme  of  the  classification  machine  (either  bi- 
anry  or  unitary)  there  is  complete  absence  of  any  elements  of  self¬ 
organization  or  self-improvement.  Therefore,  following  Attlee,  we  in¬ 
troduce  changes  and  additions  into  the  scheme  of  the  unitary  machine, 
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after  which  this  machine  is  converted  into  the  so-called  conditional 
probability  machine. 

For  simplification  of  the  notations  we  shall  denote  any  positive 
properties  of  the  images  by  capital  Latin  letters.  If  I  and  J  are  pos¬ 
itive  properties,  then  we  use  /  jj‘j  to  denote  the  union  of  these  prop¬ 
erties,  i.e.,  the  positive  property  consisting  of  all  the  elementary 
properties  occurring  either  in  the  property  I  or  in  the  property  J,  or 
in  both  of  these  properties  at  the  same  time.  We  use  /n /  to  denote 
the  intersection  of  the  properties  I  and  J,  i.e.,  in  other  words,  the 
positive  property  consisting  of  all  those  and  only  those  elementary 
properties  which  occur  simultaneously  in  both  the  property  I  and  in 
the  property  J. 

Let  us  now  assume  that  to  the  input  of  some  unitary  machine  there 
is  applied  some  training  sequence,  i.e.,  simply  speaking,  some  sequence 
of  Images.  Generally  speaking,  not  all  the  teims  of  this  sequence  pos¬ 
sess  the  fixed  property  I.  Therefore  the  neuron  corresponding  to  the 
property  I  is  stimulated  by  some  terns  of  the  training  sequence  and 
not  by  other  terns.  The  ratio  of  the  number  of  terms  of  the  training 
sequence  possessing  the  property  I  and,  consequently,  stimulating  the 
indicated  neuron,  to  the  total  number  of  terms  of  this  sequence  is  nat¬ 
urally  terned  the  property  frequency  for  the  given  training  sequence 
(which  we  shall  also  tern  the  training  history).  For  clearer  differ¬ 
entiation  from  the  conditioned  frequency  which  is  introduced  later,  we 
customarily  tern  thefrequency  Just  defined  the  unconditioned  frequency. 

We  designate  the  unconditioned  frequency  of  the  property  I  by 
p (I ) ;  here  the  training  history  is  assumed  to  be  fixed. 

Let  us  impose  on  the  neurons  of  the  unitary  classification  ma¬ 
chine  the  additional  function  of  computing  the  unconditioned  frequency 
of  the  appearance  of  the  properties  corresponding  to  them.  If  the  image 

-  308  - 


k  _  I 

being  applied  to  the  machine  input  possesses  some  property  I,  then  the 
neuron  corresponding  to  this  property,  after  calculating  and  memoriz¬ 
ing  the  unconditioned  frequency  of  the  property  I,  delivers  at  the 
given  moment  the  output  signal  equal  to  one.  If,  however,  this  neuron 
is  not  stimulated  (i.e.,  if  the  current  image  does  not  possess  the 
property  I),  then  its  output  signal  will  be  the  value  of  the  uncondi¬ 
tioned  frequency  of  the  property  I  which  is  stored  in  the  neuron. . 

With  the  indicated  additions  and  alterarions  the  unitary  machine 
now  takes  on  certain  features  which  are  characteristic,  if  not  of  the 
self-organizing  automata,  in  any  case,  of  the  self-adaptive  automata. 
Further  Improvement  involves  the  computation  of  the  so-called  condi¬ 
tioned  frequencies  of  the  properties  being  classified  by  the  unitary 
machine. 

The  conditioned  frequency  p(l/J)  of  the  property  I  with  relation 
to  the  property  J  is  the  ratio  of  the  number  of  cases  of  Joint  appear¬ 
ance  of  the  properties  I  and  J  (i.e.,  in  other  words,  the  appearance 
of  the  property  /u7)  to  the  total  number  of  cases  of  appearance  of  the 
property  J 

p  m  -  ^  •  (107) 

We  shall  as  before  consider  that  the  neurons  of  the  unitary  ma¬ 
chine  compute  and  remember  the  unconditioned  frequencies  of  the  prop¬ 
erties  corresponding  to  them.  Just  as  before,  in  the  case  of  direct 
stimulation  of  a  neuron  (i.e.,  in  the  case  of  the  presence  in  the  cur¬ 
rent  image  of  the  property  corresponding  to  this  neuron)  the  neuron 
will  deliver  a  signal  equal  to  one.  However,  all  the  neurons  which  are 
not  subjected  to  direct  stimulation  must  now  deliver  not  the  uncondi¬ 
tioned,  but  theconditioned  frequencies  of  the  properties  corresponding 
to  them.  The  only  question  is:  relative  to  what  property  are  the  indi- 
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cated  conditioned  frequencies  *• o  be  calculated.  It  is  easy  to  see  that 
as  the  property  J  it  is  most  n/?<v.'.ral  to  select  the  maximal  positive 
property  of  the  image  being  consi3e.ed,  which  is  the  union  of  all  the 
elementary  properties  which  the  giver*  image  possesses.  Actually,  only 
the  very  maximal  property  completely  determines  the  image  corresponding 
to  It,  so  that  the  conditioned  frequencies  will  be  actually  referred  to 
the  frequency  of  the  appearance  of  this  image. 

Let  us  introduce  the  concept  of  subordination  for  the  neurons  of 
the  unitary  machine.  We  say  that  the  neuron  A  is  subordinate  to  the 
neuron  B  if  the  property  J  corresponding  to  the  neuron  B  includes  in 
itself  all  the  elementary  properties  from  the  property  I  corresponding 
to  the  neuron  A. 

The  neuron  Q,  corresponding  to  the  maximal  positive  property  of 
some  fixed  image,  is  obviously  characterized  by  the  subordination  to 
it  of  all  the  neurons  which  are  directly  stimulated  by  this  Image,  and 
it  is  not  subordinate  to  any  of  these  neurons,  except,  or  course,  it¬ 
self.  All  the  neurons  to  which  the  neuron,  Q  Is  subordinate  constitute 
the  so-called  superset  M(Q)  of  this  neuron.  In  the  case  being  consid¬ 
ered  none  of  the  neurons  P  from  the  set  M(Q),  except  Q  itself.  Is  di¬ 
rectly  stimulated  and  therefore  must  deliver  a  signal  equal  to  the  con¬ 
ditioned  frequency  p(l/J)  of  the  property  I,  corresponding  to  the  neu¬ 
ron  P,  relative  to  the  property  J,  corresponding  to  the  neuron  Q.  Prom 
the  definition  of  the  superset  it  follows  directly  that  lyj  —l , 
Therefore,  as  the  result  of  equation  (107), 

p<W)“pPT  (108) 

Thus,  for  all  the  neurons  from  the  superset  M(Q)  of  the  neuron  Q 
the  output  signals  can  be  determined  from  the  equation  (108) :  to  ob¬ 
tain  the  output  signal  of  the  neuron  P  from  M(Q)  the  value  stored  in 
it  of  the  unconditioned  frequency  of  the  property  corresponding  to  it 


must  be  diveded  by  the  value  fo  the  unconditioned  frequency  stored  in 
the  neuron  Q.  Attlee  terns  this  operation  for  obtaining  the  output  sig¬ 
nals  of  the  neurons  from  the  superset  M(Q)  the  supercontrol  operation. 
The  superconteol  operation  does  not  lead  to  contradiction  even  for  the 
neuron  Q,  itself,  since  in  this  case  the  output  signal  computed  using 
equation  ( 108 )  will  obviously  be  equal  to  unity,  which  agrees  with  the 
known  value  of  the  output  signal  of  the  neuron  Q  obtained  from  the  con¬ 
dition  of  the  direct  stimulation  of  this  neuron. 

The  set  of  all  the  neurons  subordinate  to  any  given  neuron  P  will 
be  termed' the  subset  of  this  neuron  and  will  be  denoted  by  L(P).  Us¬ 
ing  the  concept  of  the  subset,  it  is  not  difficult  to  determine  the 
method  of  obtaining  the  output  signals  for  all  the  neurons  which  are 
not  subjected  at  the  given  moment  to  direct  stimulation  and  do  not  en¬ 
ter  into  the  superset  of  the  neuron  Q  corresponding  to  the  maximal  pos¬ 
itive  property  of  the  image  being  considered  in  the  given  step. 

Let  us  denote  the  set  of  all  such  neurons  by  K,  and  let  R  be  any 
neuron  from  K.  If  as  before  J  denotes  the  property  corresponding  to  the 
neuron  Q,  I  the  property  corresponding  to  the  neuron  R,  then  the  neu¬ 
ron  P  which  corresponds  to  the  property  J\jJ.  will  obviously  belong  to 
the  superset  M(Q). 

As  the  result  of  equations  (107)  and  (108),  the  output  signal  of 
the  neuron  R  is  equal  to  the  output  signal  of  the  neuron  P.  Moreover, 
it  is  clear  that  the  neuron  R  occurs  in  the  subset  L(P)  of  the  neuron 
P.  This  suggests  the  conclusion  that  all  the  output  signals  not  so  far 
determined  (for  the  neurons  of  the  set  K)  can  be  obtained  as  the  re¬ 
sult  of  simple  transfer  of  the  output  signals  of  the  neurons  from  the 
superset  M(Q)  to  the  neurons  of  the  subsets  corresponding  to  them.  It 
Is  natural  to  term  this  transfer  of  the  output  signals  to  the  subsets, 
by  analogy  with  supercontrol,  the  subcontrol  operation. 
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However,  the  subcontrol  operation  defined  In  this  way  does  not 
lead  to  a  unique  determination  of  the  output  signals,  since  neuron  R 
from  K  appears  in  not  one,  but,  generally  speaking,  several  subsets  of 
the  various  neurons  from  M(Q).  For  the  elimination  of  the  resulting  am¬ 
biguity  we  note  that  among  the  subsets  H  of  all  the  neurons  from  M(Q), 
to  which  the  neuron  is  subordinate,  the  neuron  P  of  interset  to  us 
(corresponding  to  the  union  of  the  properties  I  and  J,  as  they  were 
defined  above)  will  be  subordinate  to  all  the  remaining  neurons  of  this 
subset.  It  is  easy  to  see  that  the  property  ,/u J  will  in  this  case  have 
the  highest  unconditioned  frequency  among  the  properties  correspond¬ 
ing  to  all  the  neurons  from  H.  As  a  result  of  equation  (108)  this  means 
that  the  output  signal  of  the  neuron  P  is  the  highest  among  the  output 
signals  of  all  the  neurons  from  the  subset  H. 

Thus,  to  ensure  error-free  functioning  of  the  machine  the  subcon¬ 
trol  operation  must  be  supplemented  by  still  another  rule:  if  as  the 
result  of  the  subcontrol  operation  several  different  output  signals  are 
transferred  to  some  neuron,  the  largest  of  them  must  be  selected. 

Let  us  emphasize  once  again  that  the  subcontrol  operation  is  not 
applied  to  the  neurons  whose  output  signals  are  determined  from  the  con¬ 
dition  of  direct  srimulation  or  on  the  basis  of  the  use  of  the  super¬ 
control  operation. 

The  unitary  machine  with  all  the  described  additions  and  altera¬ 
tions  in  the  neuron  functioning  laws  is  termed  a  conditional  probabil¬ 
ity  machine.  This  name  emphasizes  the  fact  that  the  conditioned  fre¬ 
quencies  computed  by  the  machine  with  sufficiently  long  training  his¬ 
tories,  which  we  will  assume  are  usually  performed  using  the  indepen¬ 
dent  trials  scheme,  tend  with  arbitrarily  high  confidence  level  to  the 
corresponding  conditional  probabilities  of  some  properties  with  respect 
to  the  others.  The  conditional  probability  machine  is  used  for  the  sim- 
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ulatlon  of  the  processes  of  the  development  and  decay  of  the  so-called 
conditioned  reflexes.  Let  us  assume  that  throughout  the  entire  train¬ 
ing  history  of  the  machine  the  property  J  appeared  almost  always  to¬ 
gether  with  the  other  property  I.  In  that  case  the  conditioned  frequency 
p(J/I)  of  the  property  J  with  respect  to  the  property  I  will  he  close 
to  unity.  If  now  property  I  appears  without  property  J,  then  the  in¬ 
action  (output  signal)  of  the  neuron  Q,  corresponding  to  the  prope 
J,  will  differ  little  in  its  intensity  from  the  reaction  of  the  r 

ron  P,  which  corresponds  to  the  property  I,  and  consequently  Is  si*b- 

1 

Ject  to  the  direct  stimulation  from  the  direction  of  this  property.  We 
will  say  In  this  case  that  in  the  machine  there  was  developed  a  condi¬ 
tioned  reflex  for  the  property  J  with  relation  to  the  property  I. 

If  after  the  development  of  the  indicated  reflex  it  Is  not  rein¬ 
forced  over  the  course  of  a  sufficiently  large  number  of  steps  In  the 
succeeding  training  history  (i.e.,  the  property  I  appears  without  the 
property  J)  then  the  conditioned  frequency  p(J/l)  will  diminish  and 
can  in  the  course  of  time  become  a  negligibly  small  quantity.  With  the 
next  appearance  of  the  property  I  without  the  property  J  the  reaction 
of  the  neuron  Q  will  be  very  slight.  In  this  case  we  shall  sp<alc  of 
the  decay  of  the  corresponding  reflex. 

These  processes  of  the  developmem  and  decay  of  the  conditioned 
reflexes,  at  least  from  the  purely  superficial  view,  are  quite  siirilai 
to  the  analogous  processes  talcing  place  in  the  living  organisms,  in 
particular  in  the  human  nervous  system.  At  the  same  time  there  arc  sev¬ 
eral  differences.  One  of  the  essential  differences  is  that  in  the 
scheme  we  have  described  the  rate  of  development  and  decay  of  the  con¬ 
ditioned  reflexes  in  the  very  beginning  of  the  training  process  is  too 
high  and  at  the  end  of  the  training  process  this  rate  is  too  low.  ThL; 
situation  can  be  rectified  by  replacement  of  the  unconditioned  and 
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conditioned  frequencies  by  the  so-called  pseudofrequencies . 

The  unconditioned  pseudofrequency  of  any  given  property  I  is  the 
quantity  r,  included  between  zero  and  one,  which  increases  with  the 
appearance  of  Images  with  the  property  I  and  decreases  with  appearance 
of  images  not  having  the  property  I.  The  quantity  r  must  tend  to  one 
if,  beginning  with  some  moment,  all  the  terns  of  the  training  sequence 
have  the  property  I,  and  must  tend  to  zero  if  all  the  terms  of  the 
training  sequence,  beginning  with  seme  moment,  do  not  possess  the 
property  I  (we  assume  here  that  the  training  sequence  can  be  comple¬ 
mented  with  new  terms  in  the  course  of  an  arbitrarily  long  period  of 
time) . 

This  definition  is  satisfied,  in  particular,  by  the  unconditioned 
frequency,  which  can  therefore  be  considered  as  one  of  the  possible 
methods  for  the  specification  of  the  unconditioned  pseudofrequency. 
However,  it  is  simpler  and  more  convenient  to  consider  as  the  uncondi¬ 
tioned  pseudofrequency  some  property  of  the  quantity  rn  =  rn(l),  given 
by  the  recurrence  relations 


"+l 


'’-  —  I 


f'«  +  a0-O 

K 


(n  =  0.  1.2....). 


(109) 


where  the  quantities  a  and  £3  are  positive  constants  which  are  strictly 
less  than  unity.  If  at  the  (n  +  l)th  step  of  the  training  there  appears 
the  property  I,  then  we  use  the  upper  line  of  the  indicated  relations, 
otherwise  we  use  the  lower  line.  The  initial  value  rQ  of  the  quantity 
rn  canbe  selected  arbitrarily  on  the  interval  (0,  l). 

If  now  in  the  conditional  probability  machine  described  above  we 
replace  the  unconditioned  frequency  of  the  properties  by  the  pseudofre¬ 
quencies  calculated  with  the  aid  of  equation  (109)  and  leave  the  neu¬ 
ron  functioning  method  as  before,  then  by  selecting  suitably  the  quan¬ 
tities  a  and  £3,  we  can  approach  much  closer  to  the  imitation  of  the  bi- 
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ological  processes  of  the  development  and  decay  of  the  conditioned  re¬ 
flexes  than  we  can  In  the  case  of  the  original  conditional  probability 
machine.  The  conditioned  frequencies  of  the  prop,  .-ties  will  an  before 
be  computed  using  equation  (107),  however,  in  place  of  theuncond it  lor.-,  d 

frequencies  in  the  right  side  of  this  equation  there  will  appear  the 

! 

unconditioned  pseudofrequencies.  Therefore  equation  (107)  will  now 
give  not  the  conditioned  frequency,  but  some  quantity  which  It  Is  nat¬ 
ural  to  term  the  conditioned  pseudofrequency  of  one  property  with  re¬ 
lation  to  another. 

We  can,  moreover,  by  defining  in  a  somewhat  different  way  the  con¬ 
cept  of  the  conditioned  pseudofrequency  improve  the  conditional  proba¬ 
bility  machine  so  that  it  will  immediately  determine  the  conditioned 
the  conditioned  pseudofrequencies  of  the  properties  without  preliminary 
calculation  and  memorizing  of  their  unconditioned  frequencies  or  iv'-u- 
dofrequencies.  To  do  this  we  introduce  into  the  unitary  classification 
machine  paired  directed  connections  between  the  neurons.  Each  curb  con¬ 
nection  is  assigned  some  weight,  which  can  take  any  real  values  on  the 
interval  (0,  l).  These  weights  can  vary  at  every  step  of  the  trainlnr. 
We  shall  denote  the  weight  of  the  connection  between  the  neurons  1  arid 

Q  in  the  nth  training  step  by  X  (P,  Q).  We  note  that  the  weight. 

Xn(P,  Q)  and  Xn(Q,  P)  are  not  necessarily  equal  to  one  another. 

The  law  of  variation  of  the  connection  weight  is  specified  by  the 

following  relations,  defined  for  all  values  of  n  =  0,  1,  2,... : 


\.+i  (*©  *  MW 

if  at  the  (n  +  l)th  training  step  the  neuron  P 
is  not  stimulated: 

xn+1  (P,Q)  =  K  ( p,Q )  +  «<i  -  K(p>  <?))• 

if  at  the  (n  +  l)th  training  step  both  neurons  P  and 
Q  are  stimulated; 


(no) 
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(no) 


if  at  the  (n  +  l)th  training  step  neuron  P  is 
stimulated  and  neuron  Q  is  not  stimulated. 

Here,  Just  as  in  equation  (109)>  a  and  3  are  positive  constants 
which  are  strictly  smaller  than  unity  and  the  initial  weight  XQ(P,  Q; 
can  be  chosen  arbitrarily  on  the  Interval  (0,  l). 

In  the  purely  qualitative  aspect  the  weight  Xn(P,  Q)  behaves  ex¬ 
actly  like  the  conditioned  frequency  pn(J/l)  of  the  property  J,  cor¬ 
responding  to  the  neuron  Q,  with  respect  to  the  propert  I,  correspond¬ 
ing  to  the  neuron  P.  Actually,  with  simultaneous  appearance  of  the 
properties  I  and  J  there  is  an  increase  of  both  the  quantity  Xn(P,  Q) 
and  of  the  quantity  pn(j/l).  Both  these  quantities  diminish  with  the 
appearance  of  the  property  I  without  the  simultaneous  appearance  of 
the  property  J.  If  the  first  situation  (simultaneous  appearance  of  the 
properties  I  and  J)  repeats  itself  several  times  in  a  row,  then  the 
quantities  Xn(P,  Q)  and  pn(J/l)  tend  to  unity.  With  numerous  repeti¬ 
tions  of  the  second  situation  (the  appearance  of  I  without  J)  both 
these  quantities  tend  toward  zero.  Therefore  it  is  natural  to  term  the 
quantity  rn(j/l)  =  Xn(P,  Q)  the  conditioned  pseudofrequency  of  the 
property  J  with  respect  to  the  property  I. 

If  we  say  that  in  the  conditional  probability  machine  the  neurons 
which  are  not  directly  stimulated  must  deliver  as  their  output  signals 
not  the  conditioned  frequencies,  but  the  conditioned  pseudofrequencies 
of  the  properties  corresponding  to  them  with  respect  to  the  maximal 
positive  property  I  of  the  image  being  considered  at  the  given  step, 
then  for  the  accomplishement  of  this  it  is  sufficient  to  transfer 
from  the  neuron  P,  corresponding  to  the  property  I,  the  output  signals 
An(P,  R)  to  all  the  neurons  R  which  are  not  directly  stimulated.  Here 
it  is  convenient  to  consider  that  the  neuron  P  sends  along  all  the 
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connections  of  the  form  (P,  R)  its  unit  output  signal,  which  is  atten¬ 
uated  along  the  corresponding  connection  as  the  result  of  multiplica¬ 
tion  by  the  quantity  Xn(P,  R),  which  by  the  condition  is  always  less 
than  unity. 

It  is  not  difficult  to  see  that  this  mechanism  for  the  formation 
of  the  conditioned  reflexes  has  still  another  essential  deficiency. 

This  is  that,  as  the  result  of  the  assumptions  we  have  made,  the  condi¬ 
tioned  reflexes  are  formed  only  with  respect  to  the  entire  linages  and 
not  to  their  individual  properties.  As  is  known,  in  the  case  of  the 
bic'logical  systems  the  situation  is  different.  Moreover,  the  reflexes 
are  most  frequently  formed  precisely  with  respect  to  the  partial  (not 
maximal)  properties  of  the  images. 

We  can,  it  is  true,  in  the  conditional  probability  machine  of  ei¬ 
ther  of  the  types  described  above  fix  once  and  for  all  the  property  I 
with  relation  to  which  the  conditioned  frequencies  and  pseudofrequen¬ 
cies  are  computed.  In  this  case  the  property  may  not  be  the  maximal 
positive  property,  and  the  conditioned  reflex  will  be  developed  pre¬ 
cisely  with  respect  to  the  property  of  the  images  and  not  to  the  image 
itself.  All  the  definitions  made  above  of  the  laws  of  the  functioning 
of  the  conditional  probability  machine  are  aslo  applicable  to  this  case, 
only  in  this  case  there  is  no  need  to  make  a  special  search  for  the 
neuron  corresponding  to  the  maximal  positive  property  of  the  image  be¬ 
ing  considered:  in  the  case  of  the  appearance  of  the  property  I  the 
role  of  this  neuron  will  always  be  played  by  the  neuron  corresponding 
to  the  property  I.  However,  in  the  case  of  nonappearance  of  the  prop- 
etry  I  all  the  neurons  (with  the  exception  of  the  neurons  which  are 
directly  stimulated)  must,  by  definition,  deliver  not  the  conditioned, 
but  the  unconditioned  frequencies  (or  pseudofrequencies)  of  the  prop¬ 
erties  corresponding  to  them. 
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In  his  original  definition  Attlee  considered  precisely  this  meth¬ 
od  of  functioning  of  the  conditional  probability  machine.  However,  in 
avoiding  one  deficiency  we  come  upon  another,  and  again  obtain  a  sys¬ 
tem  which  is  significantly  different  from  the  biological  systems,  which 
are  capable  of  developing  conditioned  reflexes  with  respect  to  several 
properties,  and  not  Just  to  one  of  them. 

In  order  to  avoid  these  last  deficiencies,  it  is  sufficient  to  re¬ 
move  the  limitation  in  the  last  of  the  schemes  which  we  have  described 
which  permits  formation  of  the  output  signals  of  the  neurons  which  are 
not  directly  stimulated  only  from  the  output  signal  of  the  single  neu¬ 
ron  P.  In  place  of  this  we  make  the  following  assumption. 

To  every  neuron  Q  which  is  not  directly  excited  we  transfer  the 
signals  Q)  from  all  the  directly  stimulated  neurons  P1(i  =  1, 

2,  ...).  The  output  signal  of  the  neuron  Q  will  be  the  signal  from  the 
number  of  signals  X^P^  Q)  (i  =  1,  2,  ...)  whioh  has  the  largest  mag¬ 
nitude. 

The  device  which  realizes  this  mechanism  for  the  generation  of 
the  neuron  output  signals  and  for  the  alteration  of  the  weights  of  the 
connections  In  accordance  with  equation  (110)  Is  termed  a  conditioned 
reflex  machine.  We  can  hope  that  it  reflects  many  important  properties 
of  the  real  neural  networks  which  constitute  the  neural  systems  of  the 
animals  and  even  the  human  neural  system.. 

In  the  real  neuron  networks,  of  course,  there  are  not  all  the  con¬ 
nections  required  by  the  complete  circuit  of  the  conditional  reflex 
machine.  Further  improvement  of  the  laws  of  the  variation  of  the  weights 
of  theconnections  is  also  possible.  In  particular,  the  first  of  equa¬ 
tions  (110)  should  apparently  be  replaced  by  the  equation  Q)  = 

=  (l  —  Y)Xn(P,  Q)  where  y  is  a  very  small  positive  constant.  After  this 
refinement  the  law  of  the  variation  of  the  weights  of  the  connections 
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will  reflect  the  property  of  the  connections  to  diminish  in  the  course 
of  time  even  in  the  absence  of  cases  of  direct  nonreinforcement  of  the 
reflex  which  is  reflected  by  this  connection. 

If  in  the  conditioned  reflex  machine  we  drop  the  requirement  for 
completeness ,  i.e.,  the  presence  of  neurons  for  all  positive  proper¬ 
ties  without  exception,  for  all  possible  images  without  exception, 
then  such  incomplete  conditioned  reflex  machines  can  be  used  for  ope¬ 
ration  with  a  large  number  of  receptors.  On  this  basis  we  can  possibly 
use  the  conditioned  reflex  machines  for  the  solution  of  the  problems 
of  the  recognition  of  visual  patterns  which  we  have  been  considering 
in  the  preceding  sections.  To  obtain  effective  results  in  this  direc¬ 
tion  it  is  necessary  to  make  a  purposeful  selection  of  those  properties 
to  which  the  neurons  in  the  indicated  incomplete  machine  will  corre¬ 
spond. 

Still  another  problem  associated  with  the  classification  systems 
is  of  interest.  In  the  classification  systems  described  so  far  all  the 
images  are  perceived  simultaneously  with  the  Images  themselves.  In 
many  cases,  however,  we  must  deal  with  Images  whose  properties  are 
manifested  gradually.  In  the  course  of  the  training.  Moreover,  in  these 
cases  the  images,  as  a  rule,  are  so  remote  from  their  visual  proto¬ 
types  that  we  shall  term  them  concepts  rather  than  images. 

Such  a  situation  arises  in  the  design  of  self- improving  systems 
for  the  recognition  of  the  meaning  of  phrases  (see  Glushkov,  Trish- 
chenko  and  Stogniy  [30]).  In  the  simplest  case,  when  we  consider 
phrases  consisting  of  only  a  subject  and  predicate,  the  concepts  being 
recognized  may  be  considered  to  be  the  nouns  selected  as  the  subjects, 
and  their  properties  may  be  the  possibility  or  impossibility  of  their 
meaningful  combination  with  the  various  verbs  appearing  as  predicates. 

If  the  verb  list  is  fixed  and  includes  n  different  verbs,  then 
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each  concept  (noun)  can  be  characterized  by  a  line  of  compatability 
with  these  verbs,  having  the  length  n.  At  the  1th  location  on  this  line 
there  will  be  one  or  zero  in  accordance  with  whether  the  combination  of 
the  considered  noun  with  the  Ith  verb  of  the  given  list  (i  =  1,  ?, 

n)  is  meaningful  or  not. 

We  shall  call  these  lines  the  verb  lines.  The  problem  of  the  self¬ 
improving  system  in  this  case  will  include  the  prediction  of  the 
largest  possible  number  of  properties  of  the  considered  concepts  on 
the  basis  of  a  limited  experiment.  If  the  experience  (training  history) 
consists  in  communicating  to  the  mentioned  system,  one  after  another, 
meaningful  combinations  of  randomly  selected  nouns  and  verbs  from  the 
given  lists,  then  in  the  course  of  the  arrival  of  such  information  cer¬ 
tain  places  of  the  verb  lines  of  the  concepts  (nouns)  which  we  have  se¬ 
lected  will  gradually  be  filled  with  ones.  In  this  case  the  training 
problem  can  be  treated  as  a  problem  of  the  reconstruction  of  the  struc¬ 
ture  of  the  unitary  machine  for  the  classification  of  the  properties 
of  the  selected  concepts.  For  the  solution  of  this  problem  it  is  nat¬ 
ural  to  organize  the  process  of  the  combination  of  concepts  which  are 
compatible  with  the  same  verbs  into  classes  which  correspond  to  new 
generalized  concepts  which  may  not  be  present  in  the  original  list.  For 
example,  as  the  result  of  combining  several  concepts  (say,  "father," 
"son,"  "student,"  and  "professor"  and  so  on)  with  respect  to  the  cri¬ 
teria  of  compatability  with  the  verbs  "live"  and  "think"  there  arises 
the  concept  of  "human."  If  after  the  formation  of  a  particular  class 
it  is  seen  that  certain  of  its  representatives  have  some  new  elementary 
property  (for  example,  the  possibility  of  combining  with  the  verb  "go") 
then  this  property  can  be  extended  to  the  entire  class  (i.e.,  to  all 
the  concepts  occurring  in  this  class).  Of  course,  errors  may  occur  in 
this  extension  of  the  properties.  To  eliminate  the  resulting  errors 
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■//<•  introduce  the  process  of  independent  composition  by  the  machine  of 
new  phrases  as  the  result  of  combination  of  other  (randomly  selected) 
concepts  from  the  considered  class  with  the  verb  whose  compatibility 
was  extended  over  the  entire  class.  The  sense  (or  nonsense)  of  these 
combinations  must  be  communicated  to  the  machine  by  the  human  teacher. 

Thanks  to  the  existence  in  language  of  connections  similar  to  the 
connection  which  is  described  by  the  statement  of  the  type  "almost  all 
those  who  think  can  also  speak,"  the  described  process  v:ith  j  iciously 
chosen  methods  of  formation  of  the  classes,  extrapolation  of  the  prop¬ 
erties,  and  composition  of  new  phrases  makes  it  possible  for  the  ma¬ 
chine  to  perform  the  correct  separation  of  phrases  into  meaningful  and 
meaningless  with  high  probability.  Here  is  is  of  essence  that  this  sep¬ 
aration  be  performed  for  all  phrases  which  can  be  constructed  from  the 
given  sets  of  nouns  and  verbs.  In  particular,  they  may  include  phrases 
which  have  not  been  constructed  by  the  machine  as  questions  for  the 
teacher  and  those  which  were  not  communicated  by  the  teacher  to  the  ma¬ 
chine  in  the  training  process. 

Experiments  on  the  training  of  a  machine  to  recognize  the  meaning 
of  phrases,  not  only  of  the  simplest  construction  just  described,  but 
also  phrases  having  a  more  complex  structure,  have  been  conducted  suc¬ 
cessfully  in  the  Institute  of  Cybernetics  of  the  Academy  of  Sciences  of 
the  Ukrainian  SSR  [30].  With  various  assumptions  relative  to  the  struc¬ 
ture  of  the  language  (in  the  present  case  —  the  sets  of  verb  lines)  we 
can  make  estimates  for  the  learning  rate  in  the  realized  algorithms 
similarly  to  the  way  this  was  done  for  the  a-perceptrons  in  the  pre¬ 
ceding  section.  However,  the  corresponding  arguments  are  quite  cumber¬ 
some  and  considerably  less  graphic  than  in  the  case  of  the  perceptrons. 
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§9.  SELF-ORGANIZATION  AND  SELF-ADAPTATION.  METHODS  OF  SOLUTION  OF  COM¬ 
PLEX  VARIATIONAL  PROBLEMS 

In  §1  of  the  present  chapter  we  introduced  the  concept  of  the 
algorithm  system  as  the  natural  fom  in  which  the  properties  of  self¬ 
organization  and  self-improvement  are  combined.  Let  us  consider  v 
concept  in  more  detail.  For  most  of  the  problems  encountered  in  prac¬ 
tice  it  is  advisable  to  differentiate  self-organization  itself  and  the 
so-called  self-adaptation,  which  is  the  simplest  case  of  self-improve¬ 
ment.  More  precisely,  we  shall  differentiate  the  simplest  type  of  self- 
improvement  on  the  basis  of  self-adaptation  and  the  higher  type  of 
self-improvement  on  the  basis  of  self-organization. 

The  difference  which  is  involved  here  is  that  self-improvement 
on  the  basis  of  self-adaptation  assumes  the  variation  of  only  certain 
numerical  parameters  in  the  operational  algorithm,  while  self-organi¬ 
zation  is  associated  with  the  variation  of  the  structure  of  the  algo¬ 
rithm  itself.  Of  course,  this  difference  is  to  a  considerable  degree 
artificial  since  with  suitable  writing  of  the  algorithms  the  varia¬ 
tions  in  the  algorithm  structure  can  be  reduced  to  variations  of  the 
numerical  parameters.  If,  for  example,  a  numeration  of  all  the  algo¬ 
rithms  of  the  considered  class  is  accomplished,  then  any  alteration 
of  the  algorithm  reduces  to  a  change  of  the  corresponding  number,  which 
can  be  considered  as  a  numerical  parameter.  However,  in  spite  of  the 
relativity  of  the  difference  between  the  use  of  some  fixed  form  of 
writing  of  the  algorithms  (algorithmic  language)  this  difference  can 
be  drawn  quite  sharply. 

We  shall  present  some  examples  of  self-adaptation  and  self-im¬ 
provement  of  the  structure  of  algorithm  schemes.  As  the  first  example 
let  us  consider  the  case  frequently  encountered  in  practice  of  self¬ 
adaptation  which  carries  the  special  name  of  extremal  regulation.  The 
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essence  of  the  problem  of  extremal  regulation  consists  in  the  delivery 
to  the  regulation  system  of  those  values  of  certain  parameters  x^, 
x0,  . ..,  xM  such  that  the  specified  function  f  =  f(x  .  x0,  .  ..,  x  } 

c.  n  i  d  n 

of  these  parameters  takes  on  an  extremal  (maximal  or  minimal)  value. 
Here  the  function  f(x^,  x2,  . ..,  xn)  also  depends  on  certain  other 
parameters  y^,  y2,  . ym  which  vary  regardless  of  our  desires  and 
over  which  direct  control  is  not  possible.  As  the  result  of  their 
change,  the  values  of  the  parameters  x^,  x2,  ...,  x>n,  which  provide 
the  desired  extremum  of  the  function  f  cannot  be  selected  once  and  for 
all  —  they  must  be  altered  along  with  the  change  of  the  parameters  y^, 

y 2>  •  •  •  »  ^jn’ 

In  practice  we  most  frequently  encounter  the  case  when  finding 
the  extrema  by  the  conventional  methods  (with  the  aid  of  equating  the 
partial  derivatives  of  the  function  f  to  zero  and  the  solution  of  the 
resulting  system  of  equations)  is  impossible  or  inexpedient.  The  rea¬ 
son  for  this  may  be,  for  example,  the  absence  of  an  analytic  expres¬ 
sion  for  the  function  f.  The  question  arises  of  what  methods  can  be 
used  for  the  solution  of  the  problem  of  self-adaptation  in  this  sit¬ 
uation.  One  of  the  universal  methods  for  the  solution  of  this  problem 
in  this  case  is  the  so-called  method  of  steepest  descent  (or  steepest 
ascent) . 

The  method  of  steepest  descent  (ascent)  serves  for  finding  the 
minimum  (maximum)  points  of  a  function  of  many  variables  with  the  aid 
of  thv,  development  of  a  special  process  for  sequential  approxima  Ion 
to  these  points.  Let  f(x^,  x2,  ...,  xn)  be  any  differential  function 
of  n  variables.  In  the  space  of  these  variables  we  select  the  arbi¬ 
trary  point  M(a^,  a2,  ...,  an)  and  find  the  approximate  values  of  the 
partial  derivatives  f1  =  a2>  •••>  an)  at  the  P°lnt  M  from  the 

equations 
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(fli» <*«••••» fl<  +  A, ...  ,an)  — 

—  f(a I.  o,, . . . ,  fl<— it  Ott  •  •  •  *  fl«))  0  “  1»  2, . . n), 

giving  all  the  variables  in  turn  the  same  increment  A.  Let  us  find  out 
what  increments  must  be  given  simultaneously  to  all  the  arguments  ^n 
order  to  approach  the  extremum  point  to  the  maximal  possible  degree 
using  only  the  values  of  the  function  f  and  its  derivatives  at  the 
point  M. 

This  latter  condition  is  quite  essential,  since  without  it  the 
question  would  be  solved  trivially;  the  increments  of  the  variables 
could  be  such  that  they  would  lead  us  directly  to  the  extremum  point. 
However,  we  do  not  know  the  extremum  point  and  we  are  required  to  re- 
sulve  the  question  on  the  approach  to  it  on  the  basis  of  the  informa¬ 
tion  and  the  behavior  of  the  considered  function  in  the  neighborhood 
of  the  selected  arbitrary  point  M(ap  a2,  ...,  an).  Denoting  the  de¬ 
sired  increments  of  the  variables  Xg,  ...,  xn  at  the  poing  M  by 
Al*  *2*  . . . »  \  respectively  and  using  the  equation  for  the  total  dif¬ 
ferential,  we  obtain  for  the  increment  Af  of  the  function  f  at  the 
point  M  the  approximate  equation 

A/  /jAj  -f*  ft  A,  ■+■•••  +  fiA*. 

If  we  agree  to  take  a  step  in  the  direction  of  the  extremum  point 
(in  the  x1#  x2,  ...,  xn  variable  space)  only  of  some  one  constant 
length  r,  then  still  another  equation  is  added  to  the  equation  for  the 
equation  for  the  for  the  increment  of  the  function  f 

A;  +  A!  +  ...  +  A*-f*.  (Ill) 

We  must  choose  the  quantities  A^,  Ag,  ...,  An  so  that  with  sat¬ 
isfaction  of  equation  (ill)  the  function  Af  will  reach  a  maximal  (with 
account  for  the  sign)  value.  Using  the  method  of  undetermined  Lagrange 
multipliers,  the  question  reduces  to  finding  the  extremum  of  the  function 
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F -£(/A-XA?)  +  Xr* 

<-i 

of  the  variables  A,,  A2,  . ..,  An>  Differentiating  with  respect  to  A^ 
and  equating  the  resulting  partial  derivatives  to  zero,  we  obtain  the 
system  of  equations 

f(  —  2XA<  =  0  (2  -  1.2 . n).  (112) 

which  must  be  supplemented,  of  course,  by  equation  (ill).  From  equation 
(112)  we  find  that 

4,=*/,.  (113) 

where  k  is  the  coefficient  of  proportionality,  common  for  all  i  =  1, 

2,  . . . ,  n. 

Depending  on  the  choice  of  the  sign  and  the  coefficient  of  pro¬ 
portionality  k,  equations  (113)  determine  two  opposite  directions 
along  which  movement  from  the  point  M  will  lead  to  the  (in  the  vicin¬ 
ity  of  point  M)  most  rapid  increase  (with  k  <  0)  of  the  function  f. 
These  directions  are  termed  respectively  the  directions  of  steepest 
ascent  and  steepest  descent  at  the  considered  point  M.  The  magnitude 
of  the  advance  r  in  either  of  the  indicated  directions  is  termed  re¬ 
spectively  the  ascent  or  descent  gradient  step  at  the  point  M. 

Depending  on  whether  we  are  required  to  find  the  maximum  or  mini¬ 
mum  point  of  the  considered  function  f,  we  select  one  of  these  direc¬ 
tions  (sign  of  the  parameter  k  in  equations  (113))  and  perform  the 
movement  in  the  selected  direction  (determined  by  the  choice  of  the 
magnitude  of  the  parameter  k)  until  the  function  f  changes  the  nature 
of  its  growth  in  this  direction,  i.e.,  switches  from  increase  to  de¬ 
crease,  or,  vice  versa,  from  decrease  to  increase.  In  other  words,  the 
maximal  advance  is  made  in  the  selected  direction  which  provides  for 
variation  of  the  function  f  in  the  desired  direction  (in  the  direction 
of  decrease  with  search  for  the  minimum  point  of  the  function  f  and  in 
the  increasing  direction  with  search  for  the  point  of  its  maximum). 

Denoting  by  the  letter  N  the  point  obtained  as  the  result  of  this 

-  325  - 


movement,  we  perform  the  same  operations  with  it  that  were  described 
for  the  point  M.  As  a  result  we  obtain  the  new  point  P,  and  so  on.  If 
the  function  f  is  sufficiently  smooth,  then  as  the  result  of  the  pro¬ 
cess  described,  continuing  it  sufficiently  long,  we  arrive  at  an  arbi¬ 
trarily  small  neighborhood  of  some  stationary  point  of  the  given  func¬ 
tion,  i.e.,  that  point  at  which  all  the  partial  derivatives  of  the 
function  are  equal  to  zero  (of  course,  this  will  be  true  only  in  the 
case  when  the  given  function  has  stationary  points)  or  at  the  neigh¬ 
borhood  of  a  point  of  the  boundary  of  the  domain  of  definition  of  the 
function  f  corresponding  to  some  local  extremal  (in  the  given  domain) 
value  of  the  function  f. 

The  desired  point  of  the  (absolute)  extremum  of  the  function  f 
with  the  assumptions  made  is  among  the  indicated  points.  However,  there 
is  no  guarantee  that  the  application  of  the  methodology  described  above 
will  lead  the  first  time  to  the  point  of  absolute  extremum  of  the  given 
function.  Therefore,  we  must  by  varying  the  initial  point  M  find  (by 
the  method  described  above)  new  stationary  points  of  the  function,  so 
that  as  a  result  of  subsequent  comparison  of  the  values  of  the  func¬ 
tion  at  these  points  we  can  select  from  among  them  the  desired  extre¬ 
mum  point. 

In  practice  a  different  route  is  generally  preferred:  we  select 
a  random  series  of  points  M^,  Mg,  . . . ,  M^.  in  the  domain  of  definition 
of  the  function  f,  and  from  them  we  vary  that  one  at  which  the  func¬ 
tion  has  the  maximal  (in  the  case  of  the  problem  of  finding  maximum 
points)  or  minimal  (in  the  case  of  finding  minimum  points)  value. 
Starting  from  the  point  thus  chosen,  we  perform  the  steepest  descent 
or  steepest  ascent  using  the  methodology  described  above.  With  suffi¬ 
ciently  large  k  (depending  on  the  selection  of  the  function  f )  with  a 
probability  arbitrarily  close  to  unity  there  can  thus  be  found  the 
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point  of  absolute  (and  not  some  local)  extremum. 

One  of  the  possible  variants  of  the  search  for  the  absolute  max¬ 
imum  of  the  function  of  two  variables  is  shown  in  Fig.  15.  In  this 
figure  the  function  is  specified  by  its  contour  lines  (lines  of  equal 
level),  Mp  Mg,  M^  denote  the  randomly  selected  initial  points  (the 
point  Mg  is  the  highest  of  them),  and  N,  P,  Q  denote  the  sequential 
series  of  points  obtained  from  the  point  Mg  using  the  steepest  ascent 
method.  In  this  example,  after  only  three  steps  of  the  steepest  ascent 
we  arrive  at  the  point  of  absolute  maximum  of  the  considered  function. 

The  algorithm  system  which  resolves  the  problem  of  self-adapta¬ 
tion  consists  of  the  operational  algorithm  A  which,  receiving  the  in¬ 
put  word  (value  of  the  function  f(x^,  Xg,  ...,  x  ))  defining  the  cri¬ 
teria  of  the  regulation  quality  and,  possibly,  the  values  of  some  other 
quantities,  delivers  an  output  word  consisting  of  the  coordinates  of 
some  point  M(a^,  a2,  ...,  an)  which  coincides  in  the  case  of  the  sta¬ 
tionary  regime  (invariance  of  the  function  f )  with  the  point  of  abso¬ 
lute  extremum  of  this  function.  In  the  case  of  the  nonstationary  re¬ 
gime  (variation  of  the  function  f )  there  comes  into  play  the  algorithm 
B  which  performs  the  search  for  the  point  of  the  absolute  extremum  of 
the  altered  function  f  by  some  method  (the  method  of  steepest  descent 
or  ascent,  for  example)  and  replaces  by  the  coordinates  of  this  point 
the  parameters  a^,  a2,  ...,  a  ,  put  out  by  the  operational  algorithm 
A. 

The  described  system,  consisting  of  the  algorithms  A  and  B,  can 
be  considered  as  a  self-adaptive  system  of  algorithms. 

Let  us  now  consider  an  example  of  self-improvement  with  variation 

of  the  structure  of  the  operational  algorithm.  Let  us  assume  that  the 

operational  algorithm  must,  from  the  various  numerical  values  of  the 

2 

coefficients  and  3  of  the  reduced  quadratic  equation  x  +  px  +  q  =  0 
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deliver  the  roots  of  this  equation,  al¬ 
though  in  the  beginning  of  the  operation 
this  algorithm  is  not  yet  known.  Since 
the  reduced  quadratic  equation  is  re- 

Pig.  15 

solved  using  the  known  equation  2  = 

=  —  g  ,  the  desired  algorithm  can  be  sought  in  the  class  of 

equations  constructed  with  the  aid  of  the  operations  of  addition,  sub¬ 
traction,  multiplication,  division  and  extraction  of  the  square  root, 
using  the  letters  £  and  5  and  whole  numbers.  All  such  equations  can  be 
numbered,  after  first  arranging  them  in  order  of  increasing  complexity: 
with  increase  of  the  number  of  the  equation  there  is  an  increase,  gen¬ 
erally  speaking,  of  the  number  of  operations  used  in  the  equation  and 
an  increase  of  the  maximal  magnitude  of  the  integral  parameters  con¬ 
tained  in  it. 

Initially,  one  of  the  simplest  equations  is  selected  as  the  ope¬ 
rational  algorithm,  say  the  equation  p  +  q.  Taking  successive  values 
of  the  coefficients  £  and  3  (p  =  3»  q  =  2,  for  example)  the  operational 
algorithm  delivers  the  corresponding  values  of  the  root  or  roots  (in 
the  present  case  x  =  p  +  q  =  5)«  The  training  algorithm  makes  a  veri¬ 
fication  of  thesolution  obtained  by  substituting  the  value  of  the  root 
found  in  theoriginal  equation.  If  these  values  of  the  roots  satisfy 
the  equation,  then  the  operational  algorithm  is  retained  unchanged.  If 
however,  the  solution  is  found  to  be  incorrect  (as  in  the  case  just 
considered)  then  thenext  equation  In  order  is  selected  as  the  opera¬ 
tional  algorithm. 

It  is  easy  to  see  that  with  the  described  organization  of  the  ope¬ 
rational  and  training  algorithms,  after  a  finite  number  of  unsuccess¬ 
ful  attempts  there  will  be  established  the  correct  equation  for  the 
solution  of  the  quadratic  equation.  Since  in  the  process  of  the  search 
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there  Is  a  change,  generally  speaking,  of  the  structure  of  the  algo- 
r  vform  of  the  equation)  and  not  simply  of  thenumerical  parameters, 
then  according  to  the  terminology  which  we  have  adopted  the  constructed 
algorithm  system  is  self-improving  on  the  basis  of  self-organization, 
i.e.,  it  is  a  higher  type  of  self-improvement  in  comparison  with  self¬ 
adaptation. 

In  the  example  considered  the  search  for  the  required  working  al¬ 
gorithm  is  performed  by  simple  sorting  of  all  the  algorithms  of  the  a 
priori  given  class.  We  can,  of  course,  not  make  use  of  preliminary  de¬ 
termination  of  some  special  class  of  algorithms,  but  rather  perform 
the  sequential  sorting  in  the  class  of  all  algorithms  written  in  a  par¬ 
ticular  fixed  algorithmic  language  (for  example,  in  the  language  of 
the  normal  algorithms),  however  in  this  case  the  search  time  as  a  rule, 
is  considerably  longer. 

Normally  the  systems  for  such  a  search  are  realized  on  high-speed 
electronic  computers  which  perform  several  thousand  operations  per 
second.  With  this  speed  the  solution  of  the  problem  described  (find¬ 
ing  the  equation  fjr  the  solution  of  the  reduced  quadratic  equations) 
takes  little  time.  However,  with  further  complication  of  the  sought 
operational  algorithms  the  search  time,  based  on  the  sorting  of  all 
the  variants,  increases  so  rapidly  that  the  realization  of  such  a 
search  in  a  reasonable  time,  even  using  the  high-speed  computers,  be¬ 
comes  impossible. 

In  this  case  we  resort  to  multistage  search:  we  first  look  for 
some  sufficiently  simple  component  parts  (blocks)  of  the  desired  ope¬ 
rational  algorithm,  and  then  use  various  combinations  of  the  con¬ 
structed  blocks.  The  blocks  themselves  can  be  built  up  from  still 
smaller  blocks,  so  that  the  process  of  further  division  of  the  search 
into  individus.1  steps  can  be  continued  even  further.  The  multistep 
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search  systems  permit  the  construction  of  very  complex  self-organiz¬ 
ing  systems  which  are  analogous  to  the  creative  search  systems  used  by 
man. 

Without  going  into  further  detail  concerning  self-improvement  on 
the  basis  of  self-organization,  we  shall  concentrate  our  attention  on 
the  problems  associated  with  self-adaptation. 

The  method  of  steepest  descent  (or  ascent)  described  above  is  al¬ 
so  a  certain  sort  of  search.  In  this  search  we  use  a  definite  strategy 
(or  tactic)  for  the  reduction  of  the  sorting  of  the  various  variants 
leading  to  the  problem  solution.  In  the  case  considered  the  search 
strategy  reduced  to  the  use  of  the  information  on  the  local  properties 
of  thecorresponding  function  f ,  which  we  term  the  estimating  function 
(the  values  of  this  function  can  be  considered  as  an  estimate  of  the 
quality  of  theapproximation  to  the  required  solution  which  is  found 
at  a  particular  step  of  the  search). 

If  the  number  of  parameters  (arguments  of  the  estimating  func¬ 
tion)  is  very  large,  various  difficulties  arise  in  the  use  of  the 
method  of  steepest  descent  (ascent)  in  the  simplest  form  described 
above  (cycling  on  secondary  minima  or  maxima,  excessively  slow  rate  of 
advance  toward  the  absolute  extremum,  etc.).  In  order  to  eliminate 
these  difficulties  we  introduce  various  alterations  and  improvements 
in  the  local  search  strategy  described  above. 

The  simplest  changes  are  associated  with  the  selection  of  the  par¬ 
ticular  descent  (ascent)  gradient  step.  In  particular.  It  is  desirable 
tc  perform  the  advance  in  a  given  direction,  not  until  the  increment 
Af  of  the  estimating  function  changes  sign,  but  only  until  the  rel¬ 
ative  magnitude  of  this  increment  Af/f  is  less  (in  modulus)  than  some 
(in  modulus)  then  some  a  prior  fixed  quantity  termed  the  gradient  test. 

In  many  cases  we  can  divide  the  extimating  function  f  into  two 
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classes,  so  that  variations  of  the  variables  of  the  first  class  lead 
to  relatively  large  variation  of  the  value  of  the  function  f,  while 
the  variations  of  the  variables  in  the  second  class  alter  this  value 
to  a  considerably  lesser  degree.  The  methods  described  above  provide 
too  low  a  rate  of  advance  toward  the  extremum  in  the  directions  corre¬ 
sponding  to  the  variations  of  the  variables  of  the  second  class.  Fig¬ 
uratively  speaking,  we  can  perform  a  rapid  descent  (in  the  direction 
corresponding  to  the  variations  of  the  variables  of  the  first  class) 
to  the  bottom  of  some  "gully"  and  then  wander  more  or  less  randomly 
about  its  bottom  without  getting  any  closer  in  practice  to  the  extre¬ 
mum  point. 

To  eliminate  this  deficiency  Gel'fand  and  Tsetlin  [18]  proposed 
a  special  method  which  they  termed  the  gully  method.  The  essence  of 
the  method  is  that  on  the  "slopes  of  the  gully"  there  are  selected  two 
points  (Xq  and  rather  than  one.  From  these  points  there  is  per¬ 
formed  a  steepest  descent  to  the  "bottom  of  the  gully"  as  a  result  of 
which  there  arise  two  new  points  AQ  and  A^,  Connecting  the  points  AQ 
and  A.^  with  a  straight  line,  they  perform  in  the  direction  of  thid 
line  (in  the  direction  of  reduction  of  reduction  of  the  estimating 
function)  the  so-called  gully  step  whose  magnitude  is  usually  consid¬ 
erably  larger  than  the  magnitude  of  the  descent  gradient  step.  This 
step  leads  to  a  new  point  Xg,  from  which  we  again  perform  a  steepest 
descent  to  the  point  Ag,  located  on  the  "bottom"  of  the  gully.  In  the 
direction  defined  by  the  points  A^  and  Ag  we  again  make  a  gully  step 
leading  to  the  new  point  X^.  From  it  we  again  perform  a  steepest  de¬ 
scent  to  the  "bottom  of  the  gully"  and  so  on. 

We  have  described  the  method  for  the  descent  (i.e.,  for  finding 
the  minimum  of  the  estimating  function  f).  Of  course  the  correspond¬ 
ing  constructions  are  applicable  to  the  ascent  (finding  the  maximum  of 
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the  function  f).  In  this  case  in  place  of  the  descent  into  the  "gully" 
we  perform  an  ascent  to  the  "ridge"  and  the  further  advance,  not  along 
the  "bottom  of  the  gully"  but  along  the  "crest  of  the  ridge." 

All  the  described  descent  (ascent)  methods  relate  to  the  class  of 
methods  for  finding  extrema  of  functions  or  functionals  which  we  com¬ 
bine  under  the  name  of  variational  methods.  In  the  majority  of  the 
cases  of  interest  for  cybernetics,  particular  limitations  of  the  pos¬ 
sible  values  of  the  arguments  of  the  estimating  function  J  take  on 
considerable  Importance.  In  this  case  the  sought  extrema  may  be  reached 
on  the  boundaries  of  the  domain  of  definition  of  the  function  f  rather 
than  within  the  domain.  The  descent  (ascent)  methods  considered  above 
are  in  principle  also  suitable  for  finding  such  "boundary"  extrema. 
However,  in  several  particular  cases  certain  special  variational  meth¬ 
ods  are  far  more  effective. 

Among  this  sort  of  methods  are  the  so-called  linear  programming 
(or  linear  planning)  methods.  These  methods  are  used  In  the  case  when 
the  estimating  function  f  Is  a  linear  function  (polynomial  of  first 

degree):  f  =  +  a2x2  +  •••  +  anxn  +  a0  and  the  boundarleB  the 

domain  of  definition  of  the  variables  are  composed  of  hyperplanes, 

i.e.,  surfaces  specified  by  equations  of  the  first  degree.  In  this  case 
the  domain  of  definition  Is  a  multidimensional  polyhedron  (not  neces¬ 
sarily  finite),  all  points  of  which  polyhedron  satisfy  the  system  of 
lenear  inequalities  of  the  form  b,  x,  +  b,  x0  +  . . .  +  b,  x  +  b.  > 

h  1  h  2  1nn  10~ 

J>  0  (1=1,  2,  ...,  m).  The  signs  of  the  inequalities  can,  of  course, 
reverse  with  a  change  of  the  signs  of  the  coefficients  b^. 

It  Is  not  difficult  to  see  that  the  estimating  function  f  takes 
extremal  value  In  one  or  several  vertices  the  extremal  value  Is  taken 
to  be  the  function  f  on  the  face  (generally  speaking,  multidimensional) 
passing  through  all  these  vertices.  Therefore  we  can  find  the  desired 
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extremal  value  by  sorting  one-by-one  all  the  vertices  of  the  polyhe¬ 
dron,  however  with  a  large  number  of  variables  this  method  is  extreme¬ 
ly  cumbersome  and  not  suitable  in  practice.  Par  more  effective  meth¬ 
ods  for  the  solution  of  linear  programming  problem  have  already  been 
developed.  These  methods  provide  for  the  use  not  of  the  linear  in¬ 
equalities,  but  rather  the  linear  equations  to  which  any  inequalities 
can  be  reduced  by  the  introduction  of  new  unknowns.  With  this  intro¬ 
duction  the  inequalities  S  fyt/  +  bi%  >  0  are  replaced  by  the  equation 
2  hfi  +  bit~  yh  where  yj^  are  the  new  unknowns  which  can  take  only  the 
nonnegative  values  (i  =  1,  2,  ...,  m). 

In  practice  we  most  frequently  encounter  the  following  statement 
of  the  linear  programming  problem,  which  we  shall  term  the  canonical 
form. 

Given  the  system  of  m  linear  algebraic  equations  with  the  un¬ 
knowns  lanxi  «=  bt  (l  =  1,  2 . m)  .  Required  to  find  that  nonegative  (all 

/-i  v  1 

Xj  £  0)  solution  of  this  system  for  which  some  fixed  linear  form  (es¬ 
timating  function)  takes  the  smallest  possible  value. 

*-i 

We  shall  describe  one  of  thepossible  effective  methods  for  the 
solution  of  this  problem  which  is  usually  termed  the  simplex  method. 

In  the  use  of  the  simplex  methods  we  perform  sequential  transforma¬ 
tions  of  the  given  system  of  equations  £  auxi  -  6,(t  =  1,  2. ....  m)  until  It 

/-i 

is  reduced  to  some  special  form.  The  system  is  first  transformed  all 
the  free  terms  b^  are  made  nonnegative  (  if  b^  <  0,  it  is  sufficient 
to  change  the  signs  of  both  sides  of  the  ith  equation);  then  the  equa¬ 
tions  are  rewritten  in  the  form  0=6,— ianx,  {l  » 1.  2,  ...  m)  •  If  in  the  re- 

/-i 

suiting  system  there  is  the  variable  x^.  appearing  only  in  one  equa¬ 
tion,  say  the  pth,  and  having  the  positive  coefficient  aik,  then  that 
variable  takes  the  name  basic,  and  the  corresponding  equation  is 
solved  relative  to  this  variable.  Identifying  all  the  basic  variables, 
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designated  by  the  letters  x1#  x2,  . ..,  x^  ,  we  reduce  our  system  to 
the  form 


*»  *  ^*  “"  S  ***/*/  "  1#  2, ,  k»)! 

/-VH 

0  *  bt  —  J]  **  k*  +  1«  4*  2» .  •  •  i  0*)- 


(114) 


The  equations  of  the  second  group  (not  solved  relative  to  x^)  are 
termed  the  0-equations  (it  is  not  Impossible  that  all  the  equations 
of  the  systems  will  be  in  this  group).  The  purpose  of  the  further 
transformations  consists  in  finding  some  nonegative  solution  of  the 
system  (11-4) .  These  transformations  reduce  to  the  sequential  (multi¬ 
ple,  generally  speaking)  repetition  of  the  cycle  consisting  of  the 
following  steps  (see  [6]): 

1.  We  find  the  O-equation  for  which  the  free  term  is  Btrictly 
greater  than  zero  (if  there  is  no  such  0-equation,  then  the  problem  is 
solved,  since,  setting  x^  =  b^  (k  *  1,  2,  ...,  kQ)  and  x^^  =  0  (i  = 

=  kQ  +  1,  kQ  +  2,  ...,  n)  we  obviously  find  the  nonnegative  solution 
of  the  system  (5)).  Let  this  be  the  i£h  equation. 

2.  In  the  found  (ith)  equation  we  identify  some  positive  coeffi¬ 
cient  a^j  (  if  all  the  coefficients  a^j  in  the  ijbh  equation  are  non¬ 
positive,  then  the  system  (ll4)  obviously  cannot  ha^e  positive  solu¬ 
tions  and,  consequently,  the  posed  linear  programming  problem  Is  un-r 
solvable). 

3.  In  the  same  column  with  the  Identified  coefficient  a.,  (i.e., 

1J1 

in  the  Jjth  column)  we  find  the  so-called  resolving  coefficient  a.^ 
wnich  is  characterized  by  the  fact  that  of  all  the  relations  b./a.  . 
with  positive  a.  .  (i  =  1,  2,  . . . ,  n)  the  ratio  b,  /a4  j,  has  the  min- 

1J1  X1  xi  1 

imal  possible  value. 

4.  The  equation  in  which  the  resolving  coefficient  appears  (i.w., 
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the  inth  equation)  is  solved  relative  to  the  variable  x.  ,  which  after 
this  is  related  to  the  class  of  basic  variables,  and  the  found  expres¬ 
sion  for  x,  is  substituted  into  all  the  remaining  equations  (if  the 
J1 

ijth  equation  does  not  belong  to  the  number  of  O-equations,  the  vari¬ 
able  x.  standing  in  its  left  side  is  excepted  from  the  number  of 
X1 

basic  variables). 

5.  After  the  solution  of  the  i,th  equation  (relative  to  x.  )  we 

1  X1 

again  find  the  O-equation  with  a  positive  free  tern  and  the  entire 
operation  described  above  is  performed  with  it. 

The  described  process  of  sequential  solution  is  continued  until 
all  the  O-equations  disappear.  The  desired  nonnegative  solution  is  ob¬ 
tained  as  the  result  of  equating  all  the  basic  variables  x^^  to  the 
corresponding  free  terns  b^i  =  1,  2,  ...,  n)  and  all  the  nonbasic 
variables  to  zero.  There  are  cases  when  the  described  process  cycles 
and  continues  infinitely  long  without  leading  to  any  solution.  In 
these  cases  we  resort  to  variation  of  the  selection  of  the  O-equation 
and  the  resolving  coefficient  in  the  2nd  and  3rd  steps  of  the  sequen¬ 
tial  solution  process,  which  usually  helps  prevent  cycling. 

After  termination  of  the  sequential  solution  process,  the  found 
expressions  for  the  basic  variables  are  substituted  into  the  estimat¬ 
ing  function  ,  as  the  result  of  which  it  takes  the  form 

/  =*d  —  ldtx'i  .  In  the  latter  expression  the  summation  extends  only  to 

/-r+l 

the  nonbasic  variables,  which  (after  corresponding  numeration)  are  as¬ 
signed  numbers  from  r  +  1  to  n. 

Then  the  solution  process  described  above  is  applied  to  the  sys¬ 
tem  of  equations  for  the  basic  variables  obtained  as  the  result  of  the 
first  application  of  this  process,  which  is  written  as 


E  «; i*;  (i  =  ':2, 

l-f+J 


r)> 
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and  to  the  new  O-equation  0  -  d  —  .  As  the  resolving  coefficients 

we  select  only  the  coefficients  a^.  The  process  terminates  after  all 
the  coefficients  dj  in  the  O-equation  become  negative.  Setting  after 
this  all  the  nonbasic  variables  equal  to  zero,  and  all  the  basic  vari¬ 
ables  equal  to  the  corresponding  free  terns,  we  obtain  the  required 
solution  of  the  original  linear  programming  problem.  We  note  that  in 
both  the  first  and  second  applications  of  the  sequential  solutuion  pro¬ 
cess  all  the  free  terms  (with  the  exception  of  theterm  d)  remain  non¬ 
negative  all  the  time. 

Linear  programming  is  widely  used  in  problems  for  the  optimal 
planning  of  transport  shipments.  Such  applications  of  linear  program¬ 
ming  were  first  developed  by  Kantorovich  [38]*  Detailed  substantia¬ 
tions  of  thesimplex  method  which  we  have  described  can  be  found  in  spe¬ 
cial  nomographs  devoted  to  linear  programming. 

We  shall  describe  still  another  general  scheme  of  the  variational 
problems  to  which  many  problems  of  so-called  dynamic  programming  (or 
dynamic  planning)  are  reduced  [7].  The  essence  of  this  scheme  reduces 
to  the  following:  in  some  (generally  speaking,  multidimensional) 
Euclidean  space  with  the  aid  of  certain  limitations  we  identify  a  cer¬ 
tain  class  of  curves  which  we  shall  tern  trajectories.  On  the  set  of 
all  possible  trafectories  there  is  given  some  estimating  function  (or, 
as  we  usually  say,  functional).  The  problem  consists  in  finding  the 
trajectory  on  which  the  value  of  this  functional  is  greatest  or  least. 

Let  us  consider  one  of  the  quite  general  numerical  (approximate) 
methods  of  solution  of  the  indicated  problem  developed  by  Mikhalevich 
and  Shor  the  method  of  sequential  analysis  of  variants.  We  shall  dis¬ 
cuss  this  method  with  application  to  one  of  the  simplest  cases  when 
the  basic  space  is  two-dimensional,  the  class  K  of  trajectories  con¬ 
sists  of  all  the  piecewise-smooth  curves  connecting  the  two  fixed 
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points  A  and  B  of  the  space  and  contained  wholly  in  some  fixed  finite 
region  Q,  and  the  estimating  functional  P  has  the  property  of  additiv¬ 
ity.  The  additivity  property  consists  in  the  functional  P  being  consid¬ 
ered  defined  not  only  on  the  integral  trajectories  but  also  on  any  of 
their  pieces  (open  subsets  and  their  closures)  and  with  combination  of 
two  disjoint  pieces  into  one,  the  corresponding  values  of  the  estimat¬ 
ing  functional  are  added 

The  formulated  conditions  correspond  to  the  dynamic  programming 
problem  in  the  Bellman  formulation. 

The  first  step  in  the  method  of  sequential  analysis  of  variants 
is  the  limitation  of  the  class  K:  of  all  the  trajectories  connecting 
the  points  A  and  B  we  identify  only  certain  broken  lines.  This  is  done 
by  means  of  passing  several  sections  (straight  lines  in  the  present 
case)  perpendicular  to  the  segment  AB  and  intersecting  it.  The  ver¬ 
tices  of  the  broken  lines  considered  can  be  located  only  on  the  se¬ 
lected  sections.  Further,  each  section  (in  the  limits  of  the  region  Q) 
is  approximated  by  a  finite  set  of  points  (possible  vertices  of  the 
broken  lines).  The  density  of  the  positioning  of  the  points  of  the  ap¬ 
proximating  set,  and  the  density  of  the  sections,  is  defined  on  the 
basis  of  the  required  accuracy  of  the  problem  solution.  A  graphic  im¬ 
pression  of  therequired  constructions  is  given  by  Fig.  16. 

In  Fig.  16  the  boundary  of  the  region  Q  is  shown  by  the  dashed 
curve,  the  sections  are  denoted  by  the  Roman  numerals,  and  the  points 
of  the  sets  which  approximate  the  sections  are  denoted  by  Arabic  nu¬ 
merals.  If  the  number  of  points  in  the  ith  section  is  denoted  by  ^ 
and  the  total  number  of  sections  by  m,  then  the  total  number  of  broken 
lines  (N)  corresponding  to  the  postulated  conditions  will  be  deter¬ 
mined,  as  is  easily  seen,  by  the  product  of  the  numbers  n, :  N  =  n,nj ...  nm. 
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This  quantity  increases  very  rapidly  with  increase  of  the  number  of 
points  and  sect  ions ,  the  result  being  that  the  soiling  of  all  the  var¬ 
iants  is  practically  Impossible. 


I  " _ «  IT 

/  •!  •*  Jt'vn 

a4  •$  o  •! 

•}  •*  •?  t?  ✓ 

•»  ' 


Pig.  16 


However,  we  can  abbreviate  the  sort¬ 
ing  by  using  the  following  technique.  Let 
us  connect  point  A  by  segments  of  straight 
lines  with  all  the  points  of  the  first 
section  and  calculate  the  value  of  the 


estimating  functional  P  for  all  these  segnents.  To  each  point  J  of  the 
first  section  we  assign  the  value  pj  of  the  functional  P  on  the  seg¬ 
ment  connecting  this  point  with  the  point  A.  For  each  point  k  of  the 
second  section  we  find  the  point  of  the  first  section  such  that  the 

value  of  theestimating  functional  P  on  the  broken  line  A.  k  will  be 

Jk 

smallest  in  comparison  with  its  values  on  all 


r,  — 

,‘JL 

FiJk 


^  where  F(Jk,  k)  is  the  value  of  the  functional 

F  on  thesegment  k].  The  corresponding  (min¬ 

imal  among  all  possible  values)  value  of  the  functional  F  on  the 

p 

broken  line  A^  k  is  denoted  by  F^.  To  find  it  we  make  use  of  the  al- 
ready  found  values  of  the  functional  F^  at  the  points  of  the  preced¬ 
ing  (first)  section,  which  of  necessity  will  be  minimal  here,  and  the 
sorting  is  limited  to  only  all  possible  segments  connecting  the  points 
of  the  first  and  second  sections. 

Let  us  assume  that  for  all  the  points  £  of  the  ith  section  there 
have  already  been  found  the  minimal  values  of  the  functional  F  on 
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the  other  permissible  broken  lines  connecting 
the  point  A  with  the  point  k.  Since  the  func¬ 
tional  F  is  additive,  the  question  reduces  to 

the  minimization  of  the  sum  Fj  +  F( k), 

k 


all  the  permissible  broken  lines  connecting  these  points  with  the 
point  A.  Let  us  consider  the  portion  between  the  ith  and  the  (  i  + 


+  l)th  sections  (Pig.  17).  For  each  point  c[  of  the  (i  +  l)_th  section 
we  find  the  point  p  of  the  ith  section  such  that  the  sum  F*  +  F(p  , 

4  Pq  4 

q)  is  minimal.  It  is  evident  that  the  found  minimal  value  of  the  in¬ 


dicated  sum  will  be  the  minimal  possible  value  of  the  estimating  func¬ 


tional  P  on  all  the  permissible  broken  lines  connecting  the  point  A 

i+1 

with  the  point  3.  Recording  this  value  F  and  forcing  thepoint  3  to 
run  through  all  the  points  of  the  (i  +  l)_th  section,  we  find  It  pos¬ 
sible  to  come  to  the  consideration  (on  thebasis  of  completely  analo¬ 
gous  constructions)  of  the  portion  between  the  (i  +  l)th  and  the 
(i  +  2)th  sections.  However,  for  the  consideration  of  the  portion  be¬ 
tween  the  ith  and  the  (i  +  l)th  sections  we  need  to  remember  only  the 

j 

function  9x(q),  assigning  to  each  point  £  of  the  (i  +  l)_th  section 
the  point  p  «=  qp^q)  of  the  ith  section  with  which  it  connects  most 
favorably.  In  the  case  shown  in  Pig.  17, 


<p/(l)«2;  <p'-(2)  =  3;  <p'(3)-4;  <p'(4)=5;  <p'  (5)  =  3. 

As  as  result  of  repetition  of  the  indicated  process  we  come,  fi¬ 
nally,  to  the  consideration  of  the  portion  between  the  last  section 
(mth)  and  the  final  point  B.  Finding  for  the  point  B  the  point  r  = 

=  qpm(B)  of  the  mth  section  with  which  it  connects  most  favorably,  we 
can  from  the  functions  cp1(x)  (i  =  1,  2,  . . . ,  m)  which  we  recorded  find 
the  best  trajectory  (admissible  broken  line)  hkykg. . .  k^  • •  kmB 

connecting  points  A  and  B:  the  points  of  this  broken  line  are  deter¬ 
mined  sequentially  (from  right  to  left)  with  the  aid  of  the  relations 
k^  =  cpi(kJ+1)  (i  =  m,  m  —  1,  ...,  l)  where  as  the  point  km+1  we  select 
the  point  B.  It  is  easy  to  see  that  the  found  broken  line  actually 
minimizes  the  estimating  functional  P.  Here  the  sorting  used  in  find¬ 
ing  the  broken  line  is  obviously  limited  to  only  n,n,  +  n2n,  4- ... -j- nm_,  +  nm 
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variants  in  place  of  n^g  ...  variants  with  complete  sorting  (n^  is 
the  number  of  points  in  the  ith  section). 

The  described  method  is  generalized  directly  to  the  case  of  multi¬ 
dimensional  spaces.  The  requirement  for  the  additivity  of  the  estimat¬ 
ing  functional  P  is  also  not  strictly  necessary.  It  is  obviously  suf¬ 
ficient  to  assume  that  for  any  initial  piece  APQ  of  the  trajectory  the 
value  F(APQ)  of  thefunctional  can  be  represented  in  the  form  F(APQ)  = 

«  f(F(A,  P),  F(P,  Q)),  where  f(x,  y)  is  a  real  function  which  does  not 
decrease  with  respect  to  x  for  any  value  of 

We  note  further  that  in  the  majority  of  the  cases  the  sorting  of 
the  different  variants  of  connection  of  the  points  of  two  neighboring 
sections  can  be  considerably  reduced  as  the  result  of  various  sorta  of 
limitations  (for  example,  the  limitations  on  the  maxima]  slope  of  the 
segments  of  the  broken  line  with  relation  to  the  x  axis  in  the  two-di¬ 
mensional  case).  In  several  cases  the  sorting  can  be  reduced  by  the 
use  of  the  property  of  the  continuity  of  the  estimating  functional. 

More  complex  constructions  in  the  method  of  sequential  analysis  of  var¬ 
iants  are  associated  with  special  systematization  of  the  limitations 
and  functionals  which  permits  construction  of  algorithms  for  the 
search  for  the  optimal  trajectory  by  sequential  detection  and  discard¬ 
ing  of  "unpromising"  segments  of  the  trajectories. 
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Chapter  5 

ELECTRONIC  DIGITAL  MACHINES  AND  PROGRAMMING 
§1.  THE  UNIVERSAL  PROGRAM  AUTOMATON 

One  of  the  most  significant  technical  achievements  of  our  time 
has  been  the  creation  of  the  universal  program  automata,  i.e.,  the  au¬ 
tomatic  information  processors  which  make  it  possible  to  realize  any 
algorithms.  The  modern  universal  electronic  (digital)  computers  are 
such  automata.  It  is  interesting  to  note,  as  indicated  by  the  name  it¬ 
self,  that  these  machines  were  created  for  the  purpose  of  automating 
computations,  more  exactely  -  for  the  automating  of  the  performance  of 
any  computational  algorithms.  The  term  "universal"  with  application  to 
these  machine  was  understood  by  the  creators  of  the  first  universal 
computers  (and  is  still  understood  by  many  today)  Jn  the  3ense  of  uni¬ 
versality  with  relation  specifically  to  the  computational  algorithms. 

However,  since  any  algorithm  can  be  reduced,  as  we  noted  in  Chap¬ 
ter  1,  to  thecalculation  of  some  partially  recursive  (arithmetic) 
function,  the  universality  with  respect  to  the  computational  algorithms 
turns  out  to  be  universality  In  general.  This  circumstance  Is  of  great 
practical  and  theoretical  importance,  since  actually  the  basis  of  any 
field  of  human  activity  is  the  processing  of  information  in  accordance 
with  particular,  frequently  very  complex  sets  of  algorithms. 

The  availability  to  us  of  the  universal  automatic  information  pro¬ 
cessors  such  as  the  modern  universal  electronic  computers  makes  it  pos¬ 
sible,  at  least  In  principle,  to  automate  any  field  of  human  activity 
which  is  based  on  the  processing  of  information.  This  may  be  the  solu- 
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tion  of  complex  problems  of  a  design  nature,  planning,  production  con¬ 
trol,  translation  from  one  language  to  another,  composition  of  music, 
playing  chess,  many  others.  It  is  curious  that  the  tremendous  possi¬ 
bilities  inherent  in  the  universal  electronic  machines  not  only  wore 
not  recognized  by  their  first  designers,  but  were  even  disputed  by 
some  of  them. 

In  this  connection  we  must  note  still  another  error  which  is  com¬ 
mon  among  individuals  who  are  not  familiar  with  the  theory  of  algo¬ 
rithms.  The  idea  is  prevalent  that  the  amazing  properties  of  the  mod¬ 
ern  electronic  digital  machines  are  based  on  some  specific  character¬ 
istics  of  the  elements  used  in  these  machines  -  the  electron  tubes, 
transistors,  etc.  In  actuality,  electronics  in  itself  has  no  relation 
with  their  theoretical  (qualitative)  capabilities. 

These  essence  of  the  matter  lies  in  the  specific  control  prin¬ 
ciple  and  in  the  set  of  operations  which  these  machines  can  perform, 
while  the  elements  from  which  they  are  constructed  can  be  of  quite 
varied  physical  nature  and  can,  in  particular,  be  purely  mechanical. 

The  electronic  elements  are  used  for  the  purpose  of  significantly  in¬ 
creasing  the  operating  speed  of  the  computers,  and  also  to  improve 
their  reliability  (on  the  basiB  of  some  fixed  number  of  operations 
performed  by  the  machine).  We  note  that  the  first  universal  digital 
computer  (Mark-1 )  was  built  using  electromechanical  elements  (electro¬ 
magnetic  relays)  rather  than  electronic  elements. 

The  control  principle,  which  provides  for  algorithmic  universal 
lty  (capability  of  realizing  any  algorithm)  of  the  modern  universal 
digital  machines  is  a  development  and  generalization  of  the  principle 
which  is  the  basis  of  the  algorithmic  scheme  of  Post  described  in 
Chapter  1.  Just  as  in  the  Post  scheme,  the  information  in  the  univer¬ 
sal  digital  machine  is  stored  in  a  memory  which  is  divided  into  indi- 
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vldual  cells  (memory  cells).  However,  in  contrast  with  the  Post  scheme, 

in  each  cell  there  may  be  stored  not  a  single  binary  digit  (0  or  l), 

but  an  entire  word,  composed  of  a  considerable  (usually  30-40)  number 

of  binary  digits.  We  can,  if  convenient,  consider  these  words  as 

letters  in  some  finite  alphabet,  however  this  alphabet  will  contain, 

QO  4o 

as  a  rule,  a  very  large  number  of  letters  2r  —  2  . 

Therefore  we  usually  prefer  to  consider  the  contents  of  each  mem¬ 
ory  cell  not  as  an  individual  letter,  but  as  a  word  in  a  binary  alpha¬ 
bet.  The  binary  digits  (0  and  l)  composing  this  word  are  usually 
termed  (binary)  places,  and  the  word  itself  is  termed  a  binary  code, 
sometimes  simply  a  (binary)  number.  We  can,  of  course,  consider  the 
letters  to  be  not  the  contents  of  the  individual  binary  places,  but 
some  combination  of  thes  places.  For  example,  any  binary  code  of 
length  equal  to  three  can  be  considered  as  a  number  in  the  octal  nota¬ 
tion  system,  designating  by  traids  of  binary  digits  the  octal  digits: 
000-0,  001-1,  010-2,  011-3,  100-4,  101-5,  110-6,  111-7.  Using  not  all, 
but  only  some  four  of  the  binary  digits  for  the  designation  of  the  de¬ 
cimal  digits,  we  can  represent  the  binary  codes  consiting  of  such  tet¬ 
rads  by  numbers  in  the  decimal  notation  system. 

In  addition  to  be  replacement  of  the  binary  digits  by  the  multi¬ 
place  binary  codes,  there  is  a  second  essential  difference  in  the  or¬ 
ganization  of  the  memory  (or  the  storage  device)  of  the  universal  dig¬ 
ital  machine  and  the  memory  for  the  algorithms  in  the  Post  scheme.  Be¬ 
ing  an  abstract  algorithmic  scheme,  the  Post  scheme  assumed  the  exis¬ 
tence  of  an  infinite  number  of  cells,  or  the  existence  of  a  memory  of 
unlimited  volume.  At  the  same  time,  in  the  real  technical  devices, 
which  the  universal  digital  machines  are  the  size  of  the  memory  is  of 
necessity  limited. 

In  the  modern  large  universal  digital  machines  the  size  of  the 
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high-speed  memory  does  not  exceed  100,000  cells  (usually  no  more  than 
4-8000).  This  situation  must  be  kept  in  mind,  since  it  is  closely  re¬ 
lated  with  the  concept  of  the  machine  universality.  Strictly  speaking, 
for  the  possibility  of  the  realization  of  any  algorithm  the  universal 
digital  machine  must  accomodate  the  writing  (representation)  of  this 
algorithm  in  its  memory.  Since  the  representation  of  the  algorithms 
can  be  arbitrarily  long,  for  the  actual  capability  of  realization  of 
any  algorithms  the  machine  memory  must  be  infinite. 

Keeping  in  mind,  however,  that  an  infinite  memory  cannot  be  real¬ 
ized  in  any  technical  device,  it  is  customary  to  term  a  machine  uni¬ 
versal  if  the  organization  of  its  control  and  the  set  of  operations 
are  such  that  they  would  provide  the  possibility  of  the  realization  of 
any  algorithm  with  the  condition  of  unlimited  size  of  the  memory. 

In  practice  the  universality  of  the  modern  machines  is  provided 
by  the  fact  that  in  addition  to  the  high-speed  (the  so-called  opera¬ 
tional)  memory  device,  it  is  also  equipped  with  a  relatively  slow  (the 
so-called  external)  memory  devices  which  are  capable  of  exchanging  in¬ 
formation  with  the  operational  memory  device.  The  capacity  of  the  ex¬ 
ternal  memory  (usually  composed  of  magnetic  tapes)  can  be  considered 
practically  unlimited,  which  then  determines  (with  the  possibility  of 
exchanging  codes  with  the  operational  memory)  the  practical  possibil- 
ith  of  the  performance  of  any  algorithm  on  the  machine. 

The  sequence  of  operations  performed  by  the  universal  digital  ma¬ 
chine  is  determined  by  the  program  established  in  its  memory,  which  is 
an  ordered  finite  set  of  instructions  which  can  be  considered  as  a 
natural  generalization  of  the  orders  used  in  the  construction  of  the 
Post  algorithmic  scheme.  In  contrast  with  the  Post  scheme  in  which  the 
active  cell  is  displaced  with  the  performance  of  each  succeeding  or¬ 
der  by  no  more  than  one  step  to  the  right  or  left,  in  the  universal 
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digital  machine  provision  is  made  for  the  possibility  of  arbitrary 
variations  of  the  position  of  the  active  cell  from  order  to  order.  To 
do  this,  in  each  order  there  is  introduced  the  number  of  one  or  sev¬ 
eral  memory  cells  which  are  active  with  the  performance  of  the  given 
order. 

The  number  of  the  memory  cells  in  the  universal  digital  machines 
are  customarily  termed  addresses.  The  number  of  addresses  in  the  or¬ 
ders  of  the  modern  universal  digital  machines  (the  number  of  memory 
cells  which  are  active  in  the  performance  of  any  these  orders)  usually 
varies  in  the  range  from  1  to  4.  Corresponding  to  this  we  differenti¬ 
ate  single-address,  dual-address,  triple-address  and  quadruple-address 
orders. 

Let  us  first  consider  machines  with  a  quadruple-address  syster,  of 
orders,  i.e.  ,  those  machines  in  which  the  maximal  address  level  of  the 
orders  equals  four.  Different  types  of  orders  correspond  to  different 
operations  which  can  be  performed  by  the  machine.  The  orders  are  usu¬ 
ally  recorded  in  the  machine  in  the  form  of  binary  codes  which  can  be 
stored  in  the  machine  memory  (both  operational  and  external). 

We  will  assume  that  in  each  memory  cell  there  can  be  contained 
either  one  order,  also  termed  command  or  command  word,  or  one  informa¬ 
tion  word.  Just  as  was  done  above  in  the  case  of  the  information  words, 
each  command  word  (order  code)  can  if  desired  be  considered  as  a  word 
in  any  finite  (not  necessarily  binary)  alphabet. 

Any  command  word  is  divided  into  operational  and  address  parts. 

In  the  operational  part  there  is  the  code  of  the  operation  which  is 
performed  during  the  time  of  action  of  the  order  which  is  represented 
by  this  command  word.  All  the  orders  of  the  same  type  have  identical 
operational  parts.  In  the  address  part  of  the  order  there  are  located 
the  addresses  of  the  cells  which  are  active  at  the  time  of  action  of 
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The  operations  performed  by  the  universal  digital  machines  are 
usually  divided  into  several  classes.  The  first  class  includes  the 
arithmetic  operations  —  addition,  subtraction,  multiplication  and  di¬ 
vision.  The  f ouj>address  orders  for  the  performance  of  the  arithmetic 
operations  usually  have  the  following  structure:  in  the  operational  part 
of  the  order  there  stands  a  code  number  designating  the  particular  ope¬ 
ration  (for  example,  one  -  addition,  two  -  subtraction,  three  -  multi¬ 
plication,  four  -  division).  The  first  two  addresses  in  the  order  are 
used  to  indicate  the  addresses  of  the  memory  cells  which  store  the  num¬ 
bers  with  which  the  operation  is  to  be  performed,  i.e.,  the  addresses 
of  the.  addends  in  the  case  of  addition,  the  addresses  of  the  minuend  and 
subtrahend  in  the  case  of  subtraction,  etc.  The  third  address  of  the  or¬ 
der  shows  the  transfer  of  the  result  of  the  performance  of  the  opera¬ 
tion  (sum,  difference,  product  or  dividend).  Finally,  the  fourth  ad¬ 
dress  of  the  order  is  used  to  indicate  the  memory  cell  which  stores 
the  order  to  be  performed  following  the  given  order. 

The  orders  for  the  performance  of  the  logical  operations  are  con¬ 
structed  just  as  in  the  case  of  the  arithmetic  operations.  The  logical 
operations  are  as  a  rule  two-place  operations,  performed  place-by- 
place,  i.e.,  separately  for  each  pair  of  corresponding  binary  places 
which  participate  in  the  code  operation.  These  include,  for  example, 
placewlse  conjuctlon  (logical  multiplication)  and  placewise  disjunc¬ 
tion  (logical  addition).  There  are  also  single-place  logical  opera¬ 
tions  on  the  codes.  Such  operations  incluse  right  and  left  (logical) 
shifts  which  transform  the  binary  code  x^,  x2,  ...  xn  into  the  codes 
0X]X2  . . .  xn__1  and  xgx^  . . .  xn0  respectively. 

A  special  role  is  played  by  the  so-called  control  transfer  opera¬ 
tions,  which  serve  for  the  variation  of  the  order  of  performance  of 
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the  program  orders  as  a  functions  of  the  results  obtained  in  the  real¬ 
ization  of  the  program.  A  typical  control  transfer  operation  (also 
termed  the  conditional  transfer  operation)  is  the  so-called  operation 
of  conditional  transfer  on  exact  coincidence  of  words.  The  first  two 
addresses  of  the  order  which  realizes  this  operation  are  used  for  the 
indication  of  the  memory  cells  from  which  the  two  words  being  compared 
with  oau  another  are  taken.  In  the  case  of  coincidence  (quality)  uf 
these  words  the  next  order  is  taken  from  the  memory  cell  indicated  by 
the  third  address,  and  in  the  case  of  noncoincidence  it  is  taken  from 
the  fourth  conditional  transfer  order  address.  Conditional  transfers 
of  other  forms  are  also  possible,  for  example,  conditional  transfer  on 
the  basis  of  the  sign  of  the  difference  of  two  words  or  on  the  basis 
of  the  sign  of  some  one  word  (in  the  latter  case,  of  course,  it  is  suf¬ 
ficient  to  have  three  rather  than  four  addresses  in  the  conditional 
transfer  order). 

The  memory  of  the  universal  digital  machines  is  usually  arranged 
to  that  with  the  selection  (reading)  of  a  word  from  any  cell  for  the 
performance  of  a  particular  operation  there  takes  place  a  sort  of  bi¬ 
furcation  of  this  word.  One  of  its  exemplars  goes  to  the  correspond¬ 
ing  device  for  the  performance  of  the  operation,  while  the  other  re¬ 
mains  in  the  cell  from  which  the  selection  was  made.  With  the  writing 
of  a  new  word  into  a  particular  memory  cell  the  information  previously 
contained  in  this  cell  is  automatically  destroyed  (erased). 

Taking  account  of  the  indicated  properties  of  the  memory  of  the 
universal  digital  machines,  it  is  not  difficult  to  note  that  for  the 
performance  of  any  Post  algorithm  it  is  sufficient  to  make  use  of  only 
the  operation  of  (algebraic)  addition,  one  of  the  conditionald  trans¬ 
fer  operations,  for  example  the  operation  of  conditional  transfer  on 
exact  coincidence  of  words,  and  the  operation  of  machine  stop. 
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Actually,  we  shall  agree  to  operate  with  only  some  two  informa¬ 
tion  words  Pq  and  p^,  the  first  being  identified  with  zero  and  the  sec¬ 
ond  with  unity  of  the  binary  alphabet  of  any  given  Post  algorithm  A. 

Let  us  divide  the  memory  of  the  considered  universal  digital  machine 
into  three  parts.  The  first  part  consists  in  all  of  five  cells:  a_1, 
aQ,  a^,  bQ,  b^  in  which  there  are  placed  the  words  -  1,  0,  1,  Pq,  p^; 
in  the  cells  of  the  second  part  there  will  be  placed  the  program  which 
simulates  the  program  (scheme)  of  thealgorithm  A;  finally,  the  third 
portion  of  the  machine  memory  M  simulates  the  information  tape  of  the 
algorithm  A. 

With  operation  of  the  algorithm  A  on  a  specific  input  word  jd  on 
which  this  algorithm  is  defined,  the  original,  intermediate  and  final 
information  occupies  only  some  limited  (finite)  part  of  the  informa¬ 
tion  tape,  since  the  algorithm  operates  only  a  finite  number  of  steps 
and  at  each  step  writes  information  in  no  more  than  one  new  cell. 
Therefore,  if  the  machine  memory  M  is  sufficiently  large  we  can  place 
in  its  third  part  identified  above  the  required  portion  of  the  infor¬ 
mation  tape.  The  difficulties  with  the  possible  insufficiency  of  the 
memory  size,  in  view  of  the  assumption  made  above,  should  not  be  taken 
into  account  in  the  resolution  of  the  question  on  the  theoretical  re- 
presentability  on  the  machine  of  particular  algorithms. 

Let  us  turn  to  the  direct  simulation  of  the  orders  of  the  Post 
algorithm  A  by  the  orders  of  the  machine  M. 

As  noted  in  §4  of  Chapter  1,  in  the  Post  algorithms  six  differ¬ 
ent  types  of  orders  may  be  encountered-  The  order  of  the  sixth  type 
(stop)  is  simulated  directly  by  the  corresponding  order  of  the  machine 
M.  The  orders  of  the  first  two  types  (writing  zero  and  unity  in  the 
cell  being  considered)  are  simulated  by  the  orders  of  the  machine  M 
which  accomplish  the  transfer  of  information  from  the  cells  bQ  or  i, 


into  the  active  cell  indicated,  for  example,  by  the  third  address  of 
the  machine  order.  It  is  clear  that  this  transfer  can  be  accomplished 
by  an  addition  order,  in  the  first  two  addresses  of  which  there  is  the 
pair  of  addresses  of  the  cells  aQ  and  bQ  or  of  the  cells  aQ  and  b^, 
while  in  the  third  address  theree  is  the  address  of  the  active  cell 
(the  fourth  address,  just  as  in  the  Post  algorithm,  is  used  for  the 
indication  of  the  address  of  the  order  which  must  be  performed  follow¬ 
ing  the  present  order). 

The  Post  order  of  fifth  type  is  simulated,  as  it  is  not  diffi¬ 
cult  to  see,  by  the  machine  order  for  conditional  transfer  on  the 
basis  of  exact  word  coincidence.  It  is  sufficient  to  compare  the  word 
in  the  cell  b.^  with  the  word  in  the  active  cell  and  transfer  to  one  of 
the  two  orders  on  the  basis  of  the  results  of  this  comparison. 

Finally,  each  Post  order  q1  of  third  or  fourth  type  is  simulated 
with  the  aid  of  a  group  of  machine  orders  whose  number  is  equal  to  the 
number  of  orders  of  first,  second  and  fifth  types  in  the  considered 
Post  algorithm  A.  Actually,  let  r*  be  any  order  of  first,  second  or 
fifth  type  of  the  algorithm  A.  In  view  of  what  was  said  above.  It  is 
simulated  by  a  single  machine  order,  which  we  denote  by  r.  Let  the  Post 
order  q’  displace  the  active  cell  one  unit  to  the  right.  In  the  case 
of  the  machine  program  this  displacement  can  be  accomplished  only  by 
means  of  variation  by  plus  1  of  the  addresses  of  all  the  active  cells. 

There  are  such  cells,  however,  only  in  the  machine  orders  which 
simulate  the  Post  orders  of  first,  second  and  fifth  types.  As  a  result 
of  suitable  numeration  of  the  addresses  we  can,  without  losing  gener¬ 
ality,  assume  that  the  addresses  of  the  active  Post  cells  are  written 
in  some  definite,  let  us  say  the  last,  address  of  the  order.  Consider¬ 
ing  the  codes  to  be  whole  binary  numbers,  we  come  to  the  conclusion 
that  for  the  alteration  of  the  address  of  the  active  cell  in  the  order 
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r  by  +1  it  is  sufficient  to  add  the  code  of  the  command  r  with  the 
code  plus  1,  located  in  the  cell  a^.  The  shift  of  the  active  cell  to 
the  left  is  accomplished  analogously. 

Prom  what  we  have  said  it  is  clear  that  algorithmic  universality 
of  any  program  controlled  digital  automaton  will  be  provided  if  with 
the  aid  of  the  operations  performed  by  it  we  can  accomplish  the  four 
operations : 

1)  the  operation  of  the  transfer  of  the  contents  of  any  memory 
cell  to  any  other  memory  cell; 

2)  the  operation  of  the  addition  of  the  code  of  an  order  located 
in  any  memory  cell  with  constants  which  alter  the  value  of  the  given 
(first,  second,  third  or  fourth)  address  of  the  order  by  plus  1  or 
minus  1; 

3)  the  operation  of  conditional  transfer  on  exact  word  coinci¬ 
dence; 

4)  the  operation  of  (unconditional)  machine  stop. 

In  the  case  considered  above,  operations  l)  and  2)  are  provided 
by  the  same  machine  operation  —  the  operation  of  algebraic  addition. 
Usually,  however,  in  the  universal  digital  machines  these  operations 
are  separated,  the  second  being  termed  the  readdressing  operation  or 
command  addition. 

Of  course,  in  addition  to  the  indicated  operations,  in  the  set  of 
operations  of  every  universal  digital  machine  there  must  be  the  opera¬ 
tions  of  the  entry  and  exit  of  the  information  from  the  machine,  and 
aldo  (in  the  case  of  the  use  of  an  external  memory)  the  operations 
which  provide  for  two-way  exchange  of  information  between  the  opera¬ 
tional  and  external  memory  devices. 

Let  us  consider  the  question  on  the  ways  of  reducing  the  number 
of  addresses  of  the  orders.  First  of  all  it  is  not  difficult  to  see 
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that  the  fourth  address,  used  for  the  indication  of  the  succeeding  or¬ 
der,  can  be  eliminated  by  positioning  the  orders  in  the  machine  memory 
so  that  the  address  of  the  order  to  be  performed  following  any  given 
order  jd  is  always  larger  by  unity  than  the  address  of  the  order  jd  it¬ 
self.  In  other  words,  the  use  of  the  fourth  address  becomes  unneces¬ 
sary  under  the  condition  that  the  order  of  arrangement  of  the  instruc¬ 
tions  in  the  memory  cells  corresponds  to  the  order  of  their  perform¬ 
ance  by  the  machine. 

Violation  of  the  usual  (natural)  order  of  succession  of  instruc¬ 
tions  can  occur  only  in  the  case  when  the  instruction  being  performed 
is  the  conditional  transfer  command.  As  we  noted  above,  in  the  four- 
adress  system  of  instructions  one  address  is  used  for  the  indication 
of  the  following  instruction  with  nonsatisfaction  of  the  condition  on 
which  the  conditional  transfer  is  based,  and  a  second  address  is  used 
for  the  Indication  of  the  following  instruction  with  satisfaction  of 
this  condition.  With  replacement  of  the  four-address  system  of  instruc¬ 
tions  by  a  three-address  system  it  is  usually  assumed  that  In  the  first 
case  (nonfulfillment  of  the  condition)  after  the  Instruction  for  con¬ 
ditional  transfer  there  is  performed  the  instruction  written  in  the 
next  memory  cell  In  order,  and  only  in  the  second  case  with  fulfill¬ 
ment  of  the  condition  is  one  of  the  addresses  used  for  the  Indication 
of  the  address  of  the  instruction  which  must  be  performed  next. 

From  this  it  follows  that  the  fourth  address  can  be  made  redun¬ 
dant  not  only  in  the  ordinary  instructions  which  do  not  alter  the  sub¬ 
sequent  order  of  performance  of  the  instructions,  but  also  In  the  in¬ 
structions  for  the  conditional  transfer.  The  resulting  three-address 
Instruction  system  is  usually  termed  a  system  with  natural  succession 
of  instructions,  in  contrast  with  the  previously  described  four-ad- 
dress  system  with  forced  succession  of  Instructions.  The  advantage  of 
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the  latter  system  lies  in  the  greater  freedom  which  it  offers  in  the 
question  of  the  arrangement  of  the  sequence  of  commands  in  the  memory 
device  which  are  used  to  control  the  operation  of  the  machine.  The  ad¬ 
vantage  of  the  three-address  system  is  the  simplification  of  the 
structure  of  the  instruction. 

Further  reduction  of  the  number  of  addresses  in  the  instructions 
can  be  achieved  as  a  result  of  fixing  some  supplementary  memory  cell, 
usually  structurally  separated  from  the  other  memory  cells  of  the  ma¬ 
chine.  After  the  fixing  of  this  cell  a  simple  way  is  opened  to  the  re¬ 
duction  of  the  number  of  addresses  in  the  instruction  to  the  natural 
minimum,  i.e.,  to  a  single  address.  Let  us  clarify  this  method  using 
as  an  example  the  addition  operation,  which  requires  the  use  of  three 
addresses  s  the  address  of  the  addend  a^,  the  address  of  the  augend  a2 
and  the  address  a^  to  which  the  sum  is  to  be  sent.  With  the  aid  of  the 
fixing  of  the  supplementary  cell  bQ  this  three-address  operation  can 
be  performed  by  means  of  the  sequential  performance  of  three  single- 
adress  operations  -  the  operation  of  the  transfer  of  the  number  from 
the  cell  a^  into  the  (fixed)  cell  bQ,  the  operation  of  addition  of  the 
number  contained  in  cell  a2  with  the  number  in  cell  bQ  with  subsequent 
writing  of  the  result  in  cell  bQ  and,  finally,  the  operation  of  trans¬ 
fer  of  the  number  from  cell  bQ  into  cell  a^.  Since  in  this  case  only 
the  addresses  a^,  a2,  a^  can  change  while  the  address  bQ  is  fixed  once 
and  for  all,  all  three  Indicated  operations  are  actually  realized  with 
the  aid  of  single-address  instructions. 

The  described  method  leads  to  the  single-address  system  of  in¬ 
structions  which  is  used  in  many  of  the  modern  universal  electronic 
digital  machines.  Having  the  single-address  system,  it  is  not  diffi¬ 
cult  to  construct  also  the  two-address  system  of  instructions.  In  this 
case  the  second  address  can  be  used  either  for  the  indication  of  the 
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address  of  the  following  instruction  (two-rddress  system  with  forced 
succession  of  instructions)  or  for  the  indication  of  the  addresses  of 
the  number  codes  (information  words)  with  which  the  operations  are  per¬ 
formed  (two-address  system  with  natural  succession  of  commands). 

Every  device  having  a  discrete  memory  which  is  divided  into  in¬ 
dividual  cells  and  whose  operation  can  be  controlled  with  the  aid  of 
a  sequence  of  command  words  -  instructions  —  which  are  arranged  in 
certain  of  these  cells  is  termed  a  program  automaton  and  the  indi¬ 
cated  sequence  of  instructions  itself  is  termed  the  program  for  the 
automaton  operation. 

If  the  set  of  operations  (types  of  instructions)  performed  by  the 
program  automaton  makes  it  possible  to  compose  from  them  the  opera¬ 
tion  of  the  transfer  of  the  information  words  from  any  memory  cell  In¬ 
to  any  other  memory  cell,  the  operation  of  readdressing  (alterations 
of  the  addresses  in  the  instructions)  by  ±1,  the  operation  of  condi¬ 
tional  transfer  and  machine  stop,  and  if  as  the  program  of  the  autom¬ 
ation  there  can  be  specified  any  finite  sequence  of  operations  from 
this  set,  then  such  an  automaton  is  termed  a  universal  program  autom¬ 
ation. 

With  an  accuracy  to  the  limitations  introduced  by  the  fixed  mem¬ 
ory  size,  the  universal  program  automaton  is  capable  of  reproducing 
any  algorithm  with  the  condition  of  suitable  coding  of  its  input  and 
output  alphabets.  This  conclusion  relates  not  only  to  the  conventional 
algorithms,  but  also  the  random  and  self-altering  algorithms  (see  §5 
of  the  present  chapter). 

§2.  STURCTURE  OP  THE  MODERN  UNIVERSAL  PROGRAM  AUTOMATA 

The  modern  universal  program  automata  consist  of  five  different 
basic  devices  —  the  memory  unit  (MU),  the  arithmetic  unit  (AU),  the 


control  unit  (CU),  the  input  unit  and  the  output  unit  (output).  As  we 

-  353  - 


noted  in  the  preceding  section,  the  memory  device  (memory)  serves  for 
the  memorizing  and  storing  of  the  program  for  the  automaton  operation, 
and  also  of  the  initial,  final  and  intermediate  information.  The  input 
device  serves  for  the  input  of  the  program  and  the  initial  information 
(conditions  of  the  problem)  into  the  automaton  memory,  the  output  de¬ 
vice  serves  for  the  output  from  the  memory  of  the  final  information 
(response  to  the  problem  posed  to  the  automaton). 

The  arithmetic  device,  as  its  name  indicates,  serves  for  the  per¬ 
formance  of  arithmetic  operations.  However  the  arithmetic  unit  is  usu¬ 
ally  also  used  for  the  performance  of  other  operations,  logical  for 
example.  In  this  connection  it  would  be  more  accurate  to  term  the 
arithmetic  device  an  operational  device.  However  we  shall  not  deviate 
from  established  tradition  in  the  terminology  of  the  AU. 

Finally,  the  control  unit  combines  and  coordinates  the  operation 
of  the  all  remaining  devices  of  the  universal  program  automaton,  ac¬ 
complishes  the  selection,  decoding  and  organization  of  the  instruc¬ 
tions  composing  the  program.  In  the  modem  universal  program  automata 
the  control  unit  Is  constructed  on  the  cyclic  principle.  The  essence 
of  this  principle  is  that  the  operation  of  the  automaton  in  time  is 
broken  down  into  natural  intervals,  termed  the  operating  cycles,  in 
the  course  of  which  there  Is  repeated  approximately  the  same  sequence 
of  elementary  operations. 

The  determination  of  the  beginning  and  end  of  the  operating  cy¬ 
cle  is  to  a  certain  degree  arbitrary,  since  their  simultaneous  shift 
in  either  direction  is  possible.  We  will  assume  that  the  operating  cy¬ 
cle  begins  when  in  the  control  unit  the  command  (instruction)  subject 
to  performance  has  already  been  transmitted.  In  the  course  of  the  cy¬ 
cle  this  command  Is  performed:  its  operational  part  is  used  for  the 
readying  for  performance  of  certain  operations  of  both  the  control 
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unit  itself  and  of  the  arithmetic  unit.  The  adress  portion  of  the  com- 
mand  is  used  for  the  excitation  of  the  corresponding  cells  of  the  MU 
for  the  purpose  of  extracting  from  them  or  entering  in  them  of  certain 
information.  In  the  multi-address  systems  the  command  for  the  decod¬ 
ing  of  the  various  addresses  is  accomplished  sequentially.  The  operat¬ 
ing  cycle  terminates  with  the  extraction  from  the  MU  and  the  transmis¬ 
sion  to  the  CU  of  the  code  of  the  following  instruction  subject  to  per¬ 
formance. 

We  note  that  the  CU  not  only  transmits  information  to  other  units 
but  also  receives  information  irorn  them:  the  command  code  from  the  MU, 
the  result  of  the  verification  of  the  conditions  defining  the  transfet 
to  one  or  another  of  the  two  commands  following  after  the  conditional 
transfer  command  from  the  AU,  and  certain  other  signals  from  the  AU. 

As  we  mentioned  in  the  preceding  section,  in  addition  to  the  opera¬ 
tional  memory  unit  (OMU)  the  modern  universal  program  automata  are  al¬ 
so  equipped  with  an  external  memory  unit  (S1U)  which  is  slower  acting 
than,  but  at  the  same  time  of  larger  capacity  in  comparison  with  the 
OMU.  The  block  diagram  of  a  universal  program  automaton  which  defines 
the  interaction  (information  exchange)  between  its  basic  units  is 
shown  In  Pig.  18. 

For  the  detailing  of  the  block  diagram  let  us  consider  in  more 
detail  the  structure  of  the  individual  units  composing  it,  and  pri¬ 
marily  the  structure  of  the  OMU,  AU  and  CU.  Essential  component  parts 
of  all  three  devices  mentioned  are  the  so-called  registers.  A  register 
is  a  memory  cell  which  is  intended  for  the  storage  of  one  Information 
or  command  word.  However,  in  contrast  with  the  usual  memory  cells  to 
which  access  is  possible  only  after  the  accomplishment  of  quite  com¬ 
plex  preliminary  commutation  (switching),  the  registers  are  particu¬ 
larly  accessible  memory  cells  whose  inputs  and  outputs  are  direclty 
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connected  to  the  circuits  which  transmit  the  information. 


Depending  on  the  method  of  functioning  of  these  circuits  in  uni¬ 
versal  program  automata  (universal  digital  machines)  are  divided  into 
two  major  classes:  the  series  and  parallel  machines.  In  the  parallel 
machines  (automata)  with  the  transmission  of  the  code  from  register  to 
register  all  the  digits  of  this  code  are  transmitted  simultaneously, 
while  in  the  series  machines  they  are  transferred  sequentially,  one 
after  the  other.  It  is  clear  that  the  parallel  machines,  other  condi¬ 
tions  being  the  same,  will  be  faster  acting  then  the  series  machines, 
although  they  require  a  larger  number  (equal  to  the  number  of  digits 
in  the  machine  codes  of  the  words)  of  parallel  channels  for  the  trans¬ 
mission  of  the  information  between  the  registers,  while  in  the  series 
machines  we  can  limit  ourselves  to  one  such  channel. 
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Pig.  18.  1)  External  memory  unit; 

2)  operational  memory  unit; 

3)  input;  4)  output;  5)  arithmetic 
unit;  6)  control  unit. 


The  arithmetic  unit  of  the  modern  universal  electronic  digital 
machines  usually  consists  of  three  registers,  one  of  which  has  the 
capability  of  summing  the  numerical  codes  transmitted  to  it  and  is 
therefore  termed  a  summator.  The  numerical  codes  with  which  the  arith¬ 
metic  unit  operates  are  numbers  of  differing  signs,  and  the  summation 
which  we  are  discussing  is  understood  as  algebraic  addition  (with  ac¬ 
count  for  the  signs).  In  the  parallel  machines  the  summation  operation 
is  usually  performed  in  two  elementary  cycles  of  the  machine.  Here  by 
elementary  cycle  we  mean  the  interval  of  time  between  two  sequential 
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clock  pulses  applied  In  the  AU.  Most  frequently  the  source  of  the 
clock  pulses  is  the  synchronizing  generator  which  is  common  to  the  en¬ 
tire  machine  and  is  a  part  of  its  control  unit.  There  are  also  other 
methods  of  organization  of  the  elementary  cycles  which  are  used  in  the 
so-called  asynchronous  machines.  These  methods  are  described  in  detail 
in  the  handbooks  on  the  electronic  digital  machines. 

The  operations  performed  in  the  various  portions  of  the  universal 
program  automaton  in  the  course  of  a  single  elementary  cycle  are  cus¬ 
tomarily  termed  microoperations.  More  complex  operations  which  are 
performed  over  several  elementary  cycles  are  realized  with  the  aid  of 
a  set  of  microoperations,  termed  the  microprogram  of  the  given  opera¬ 
tion.  The  microprogram  of  the  summation  operation  in  the  arithmetic 
units  of  the  parallel  machines  usually  consists  of  two  microopera¬ 
tions:  the  microoperation  of  bit-organized  addition  and  the  micropro¬ 
gram  with  which  there  are  realized  the  carries  from  some  places  to 
others  which  arise  as  a  result  of  the  place-by-place  addition.  In  this 
case  it  is  assumed  that  one  of  the  addends  was  established  on  the  sum- 
mat  or  ahead  of  time  and  the  other  on  one  of  the  AU  registers. 

We  can,  moreover,  also  construct  the  AU  so  that  after  the  setting 
of  the  addends  on  the  register  and  the  summator  the  addition  will  be 
accomplished  as  a  result  of  only  a  single  microoperation  —  the  trans¬ 
fer  of  the  augend  from  the  register  into  the  summator.  In  the  future 
we  shall  consider  that  the  AU  which  we  are  discussing  is  constructed 
in  just  this  way.  Summators  of  such  AU  are  custimarily  termed  accumu¬ 
lators,  since  they  have  the  capability  of  accumulating  the  sum  of  any 
number  of  terms  as  the  result  of  the  sequential  transmission  to  the 
summator  of  all  the  terms  one  after  another. 

Making  use  of  the  circumstance  that  the  summator  performs  alge¬ 
braic  addition  (with  account  for  the  signs  of  the  addends)  it  is  not 
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difficult  to  organize  on  such  a  summator  the  operation  of  subtraction 
as  well,  by  performing  the  transfer  (from  register  to  summator)  or  the 
code  of  the  subtrahend  as  an  ordinary  addend  but  with  the  sign  re¬ 
versed.  In  the  set  of  microoperations  of  the  universal  digital  ma¬ 
chines  we  therefore  introduce  the  microoperations  not  only  of  conven¬ 
tional  (direct)  transfer  of  the  number  codes,  but  also  the  transfer  of 
the  code  with  its  sign  changed.  We  must  also  provide  for  the  microope¬ 
rations  of  register  clearing,  as  a  result  of  which  on  the  cleared  reg¬ 
isters  there  must  be  established  the  number  codes  which  are  the  re¬ 
presentation  of  the  number  0. 

For  the  performance  of  the  operations  of  multiplication  and  di¬ 
vision  with  the  natural  method  of  coding  the  numbers,  the  described 
microoperations  are  not  sufficient.  Therefore,  along  with  the  micro¬ 
operations  already  described  in  the  set  of  the  microoperations  of  the 
universal  digital  machines  we  also  introduce  the  microoperations  of 
left  and  right  shift  on  the  registers.  With  performance  of  the  micro¬ 
operation  of  left  shift  the  number  code  x.^  x2  ...  xn  set  in  the  reg¬ 
ister  is  replaced  by  the  code.  x2  x^  ...  xn0,  and  with  performance  of 
the  right  shift  microoperation  —  by  the  code  0  x.jX2  ...  xn_^.  The  code 
sign  (not  specially  designated  here)  retains  its  value.  Here  the  code 
digits  on  the  right  usually  represent  the  lower  digits  of  the  number 
and  the  digits  on  the  left  represent  its  higher  digits.  Therefore  we 
also  say  the  with  a  right  shift  the  number  code  is  shifted  in  the  di¬ 
rection  of  the  lower  digits,  and  with  a  left  shift  —  in  the  direction 
of  the  higher  digits. 

As  the  feedback  signals  transmitted  from  the  AU  to  the  CU  we  usu¬ 
ally  select  the  signals  on  the  sign  of  the  number  code  which  Is  in 
the  summator  and  on  the  digit  of  the  lowest  place  of  the  number  code 
which  is  in  one  of  the  AU  registers,  which  we  denote  by  the  letter  p2. 


This  register  is  also  termed  the  multiplier  register.  The  second  reg¬ 
ister  of  the  AU  does  not  have  feedback  connection  with  the  CU.  It  is 
usually  termed  the  multiplicand  register  and  is  denoted  by  the  letter 

Pl* 

As  a  rule,  in  the  MU  there  are  only  two  registers,  one  of  which 
is  termed  the  number  register,  and  the  other  the  address  register.  On 
the  address  register  there  is  stored  the  address  of  the  cell  of  the  MU 
with  which  operation  is  to  be  performed  (writing  or  reading  of  the 
code)  and  on  the  number  register  there  is  the  number  code  being  se¬ 
lected  from  the  MU  or  being  sent  there  for  stroage.  In  addition,  there 
are  usually  two  other  channels  for  the  receipt  of  signals  from  the  CU 
on  the  nature  of  the  coming  operation  (writing  or  reading).  The  trans¬ 
mission  of  a  pulse  along  one  of  these  channels  (write  channel)  leads 
to  the  code  set  in  the  address  register  being  memorized  (written)  in 
the  MU  cell  whose  address  coincides  with  the  code  set  in  the  address 
register.  The  transmission  of  a  pul'-e  along  the  other  channel  (read 
channel)  leads  to  the  code  from  the  MU  cell  whose  address  is  set  In 
the  address  register  being  transferred  to  the  (previously  cleared) 
number  register. 

The  two  described  MU  microoperations  are  termed  respectively  the 
microoperations  of  MU  read-ln  and  read-out.  In  addition  to  these  ml- 
crooperatlons,  in  the  MU  provision  is  made  for  the  MU  register  clear 
microoperation  and  also  the  microoperaticn  of  the  transfer  of  codes 
from  the  AU  and  MU  registers  into  the  number  and  address  registers  and 
the  reverse  transfers  from  the  MU  into  the  AU  and  CU.  Speaking  of  the 
transfer  of  a  code  from  register  to  register,  we  must  always  under¬ 
stand,  if  not  stipulated  otherwise,  ordinary  transfer,  i.e.,  transfer 
of  the  code  without  change  of  its  sign. 

Let  us  analyze  the  structure  of  the  CU,  following  the  scheme  of 
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microprogramming  control,  first  described  by  Wilkes  and  Stringer  [75]. 
The  microprogram  control  unit  has  in  its  composition  two  registers, 
termed  respectively  the  command  register  and  the  microoperation  reg¬ 
ister.  The  command  register  serves  for  the  storage  of  the  command  (in¬ 
struction)  currently  being  performed.  In  accordance  with  the  accepted 
structure  of  the  instructions,  the  command  register  is  subdivided  into 
several  registers  —  the  operation  register,  the  first  address  register, 
the  second  address  register,  etc.  In  the  description  of  the  micropro¬ 
grams  it  is  sometimes  convenient  to  work  with  the  command  register  as 
a  whole,  and  sometimes  to  break  it  up  into  the  component  partB. 

The  microoperation  register,  sometimes  also  termed  the  mlcrocom- 
mand  register,  serves  for  the  storage  of  the  code  of  the  microprogram 
(microcommand)  instruction  which  is  being  performed  at  the  given  mo¬ 
ment,  i.e.,  the  code  denoting  the  ensemble  of  the  microoperations  be¬ 
ing  performed  at  a  given  moment. 

In  addition  to  the  command  and  microoperation  registers,  in  the 
universal  program  automata  with  the  natural  order  of  succession  of  In¬ 
structions  (see  §1  of  the  present  chapter)  there  is  still  another  reg- 
ister,  termed  the  command  counter.  With  the  application  of  a  pulse  to 
a  special  input  of  this  register,  there  takes  place  an  increase  by 
unity  of  the  number  code  set  in  it  prior  to  this  time.  If  the  command 
counter  were  cleared  ahead  of  time,  then  It  will  obviously  perform  a 
count  of  the  pulses  arriving  at  its  input.  This  is  where  the  register 
derives  its  name.  The  command  counter  serves  for  the  storage  of  the 
command  addresses.  In  the  process  of  the  performance  of  an  instruc¬ 
tion,  not  an  instruction  for  conditional  transfer,  there  is  an  in¬ 
crease  of  the  command  counter  contents  by  one  and  the  selection  of  a 
new  instruction  from  the  MU  in  accordance  with  the  address  thus  ob¬ 
tained  . 
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The  Increase  of  the  contents  of  the  command  counter  by  unity  is 
one  of  the  microoperations  of  the  CU.  Among  the  other  microoperations 
provided  for  in  the  CU  we  note  the  clearing  of  the  registers,  the 
transfer  of  codes  from  the  MU  number  register  into  the  command  reg¬ 
ister,  the  transfer  of  codes  from  the  CU  address  registers  (first, 
second,  etc.  )  into  the  MU  address  register,  the  transfer  of  a  code 
from  the  command  register  into  the  summator  (for  accomplishment  of  the 
readdressing  operation)  and  the  transfer  of  the  code  from  one  of  the 
CU  address  registers  (usually  from  the  third)  into  the  command  counter 
(with  performance  of  the  conditional  transfer  operation).  For  the  per¬ 
formance  of  the  logical  operations  which  were  mentioned  in  the  preced¬ 
ing  section,  several  new  microoperations  are  added  to  the  number  of 
microoperations  of  the  arithmetic  unit. 

As  for  the  microprogram  control  unit  itself,  in  the  Wilkes  and 
stringer  scheme  it  includes,  in  addition  to  the  microoperation  regis¬ 
ter  mentioned  above,  the  so-called  microoperation  decoder  and  two 
diode  matrices,*  termed  the  A-matrix  and  the  B-matrix.  A  simplified 
symbolic  circuit  of  the  microprogram  control  unit  is  shown  in  Fig.  19 . 
On  this  figure  the  letters  POn  denote  the  operation  register  (a  com¬ 
ponent  part  of  the  CU  command  register)  and  PMO  denotes  the  microope¬ 
ration  register.  The  dots  designate  the  points  of  connection  of  the 
diodes  which  connect  the  horizontal  buses  of  the  matrix  with  the  ver¬ 
tical  buses.  The  purpose  of  the  diodes  is  to  pass  the  pulses  in  the 
forward  direction  (from  the  horizontal  buses  to  the  vertical)  and  to 
prevent  their  passage  in  the  reverse  direction.  With  this  condition 
the  pulse  applied  to  any  horizontal  bus  D  goes  to  those  and  only  those 
vertical  buses  with  which  the  given  bus  D  is  connected  by  the  diodes. 
If  the  connection  of  the  buses  is  accomplished  indirectly,  false  paths 
for  the  passage  of  the  pulses  are  possible  which  were  not  intended  by 
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the  designer.  For  example,  with 
pulse  application  to  the  second- 
from-the-top  horizontal  bus  of  the 
A-matrix  the  pulse  would  appear 
not  only  on  the  first  left  verti¬ 
cal  bus  but  also  on  the  third  from 
the  left,  passing  to  it  through 
the  second-from-the-botton  hori¬ 
zontal  bus.  The  use  of  the  diodes 
eliminates  the  possibility  of  the  formation  of  such  false  paths,  since 
the  reverse  transfer  of  pulses  from  the  vertical  buses  to  the  horizon¬ 
tal  is  not  possible. 

The  purpose  of  the  decoder  shown  in  Fig.  19  is  the  transmission 
of  the  successive  pulse  SI  of  the  synchronizing  generator  applied  to 
its  input  to  precisely  the  one  of  the  horizontal  buses  of  the  A-matrix 
which  is  uniquely  determined  by  the  codes  set  at  the  considered  moment 
of  time  in  the  operation  and  microoperation  registers.  Some  of  the 
horizontal  bus  of  the  B-matrix  and  some  go  into  two  such  buses.  In  the 
latter  case  the  transfer  of  the  pulse  from  the  horizontal  buses  of  the 
B-matrix  is  determined  by  the  feedback  signal  (by  the  signal  £  or  £  In 
Fig.  19)  coming  from  the  AU, 

Transferring  to  the  vertical  buses  of  the  B-matrix  (determined  by 
the  method  of  connection  of  the  diodes)  the  pulses  enter  the  micro- 
operation  register,  altering  the  code  previously  set  there.  Therefore 
the  following  pusle  of  the  SI,  passing  through  the  decoder,  will  go  to 
a  new  horizontal  bus.  By  connecting  the  vertical  buses  of  the  A-matrix 
to  the  corresponding  units  of  the  machine  so  that  the  transmission  of 
a  pulse  along  each  of  the  vertical  buses  leads  to  the  performance  of 
precisely  one  microoperation,  we  obtain  the  possibility  by  using  this 


Fig.  19.  l)  Operation  regis¬ 
ter;  2)  microoperation  reg¬ 
ister;  3)  synchronizing  pulse; 
4)  decoder;  5)  A-matrix;  6)  B- 
matrix. 


process  of  accomplishing  any  finite  sequence  of  microoperations  (mi¬ 
croprogram),  in  this  case  combining  several  of  the  microoperations  in¬ 
to  one  microcommand. 

By  writing  the  microprograms  for  the  various  operations  which  the 
machine  must  perform,  we  determine  the  method  of  connection  of  the 
diodes  in  the  A-matrix  and  in  the  B-matrix  and  consequently  the  struc¬ 
ture  of  the  entire  microprogram  control  unit.  With  the  performance  of 
the  microprogram  of  any  machine  operation  the  contents  of  the  opera¬ 
tion  register  remains  unchanged,  while  the  contents  of  the  microope¬ 
ration  register  varies  with  every  new  elementary  cycle.  Only  at  the 
very  end  of  the  performance  of  the  operation,  after  the  selection  of 
the  new  instruction  from  the  memory,  does  the  contents  of  the  opera¬ 
tion  register  change,  after  which  there  begins  the  performance  of  the 
microprogram  of  the  succeeding  operational  cycle  of  the  machine. 

Since  the  structure  of  the  control  unit  and  even  of  the  entire 
machine  is  to  a  considerable  degree  determined  by  the  selection  of  the 
operations  and  the  microprograms  for  them,  let  us  consider  concrete 
examples  of  microprograms  for  the  most  common  machine  operations.  Here 
for  definiteness  we  shall  consider  that  the  machine  under  discussion 
is  a  three-address  parallel  universal  digital  machine  with  natural  or¬ 
der  of  succession  of  the  commands.  We  use  the  letter  £  to  denote  the 
number  sign  function  (±l)  of  the  number  located  in  the  summator,  and 
£  to  denote  the  value  of  the  lowest  place  of  the  number  in  the  multi¬ 
plier  register  (it  is  assumed  that  the  machine  operates  with  n- 
place  proper  fractions  in  the  binary  notation  system). 

Addition  Microprogram 

1.  Clear  AU  and  MU  registers. 

2.  Transfer  of  the  code  from  the  CU  first  address  register  into 
the  MU  address  register. 
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3.  MU  read-in. 

4.  Transfer  code  from  MU  number  register  into  AU  summator. 

5.  Clear  MU  registers. 

6.  Transfer  code  from  CU  second  address  register  into  MU  address 
register. 

7.  MU  read- in. 

8.  Transfer  of  code  from  MU  number  register  into  AU  summator  (most 
frequently  via  the  AU  register  P2). 

9.  Clearing  of  MU  registers. 

10.  Transfer  of  code  from  AU  summator  into  MU  number  register. 

11.  Transfer  of  code  from  CU  third  address  register  into  MU  ad¬ 
dress  register. 

12.  MU  read-in. 

13.  Clearing  of  MU  registers. 

14.  Increase  of  command  counter  content  by  one. 

15.  Transfer  of  code  from  command  counter  into  MU  address  register. 

16.  MU  read-out. 

17.  Clearing  of  command  register. 

18.  Transfer  code  from  MU  number  register  into  command  register. 

Certain  of  the  microoperations  making  up  the  described  micropro¬ 
gram  can  be  combined  in  time,  as  a  result  of  which  the  time  for  the 
operating  cycle  can  be  reduced  and  the  speed  of  the  machine  can  be  in¬ 
creased.  Examples  of  such  combined  microoperations  might  be  the  micro¬ 
operations  16  and  17  or  microoperations  10  and  11. 

Microprogram  for  Multiplication  by  Positive  Number 

1.  Clear  AU  and  MU  registers. 

2.  Transfer  code  from  CU  first  address  register  into  the  MU  ad¬ 
dress  register. 

3.  MU  read- in. 
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4.  Transfer  code  from  MU  number  register  into  AU  register  P^. 

5.  Clear  MU  registers. 

6.  Transfer  code  from  CU  second  address  register  into  MU  address 


register. 

7.  MU  read- in. 

8.  Transfer  code  from  MU  number  register  into  AU  registei  Pg. 

9.  (l)  Transfer  code  from  P.^  register  into  summator  if  q  =  1,  and 
go  to  the  following  microoperation  without  number  transfer  if  q  =  0. 

10.  (l)  Right  shift  on  summator. 

11.  (l)  Right  shift  on  Pg  register. 

9.  (2)  Same  as  9(l). 

10.  (2)  Same  as  10(l). 

11.  (2)  Same  as  ll(l). 


9.  (n)  Same  as  9(l). 

10.  (n)  Same  as  10(l). 

11.  (n)  Same  as  ll(l). 

12.  Clear  MU  register. 

13.  Transfer  code  from  CU  third  address  register  into  MU  address 
register. 

14.  Transfer  code  from  summator  into  MU  number  register. 

15.  MU  write-in. 

16.  Clear  MU  registers. 

17.  Add  one  of  content  of  command  counter. 

18.  Transfer  code  from  command  counter  into  MU  address  register. 

19.  MU  read-in,  clear  command  register. 

20.  Transfer  code  from  MU  number  register  into  command  register. 

In  the  multiplication  microprogram,  in  contrast  with  the  addition 
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microprogram,  considerable  use  is  made  of  the  possibility  of  branching 
in  the  order  of  succession  of  the  microoperations  as  a  function  of  the 
feedback  signal  3  coming  from  the  AU.  It  is  not  difficult  to  see  that 
the  sequential  performance  of  microoperations  9>  10,  11  is  equivalent 
to  the  performance  of  the  usual,  well  known  multiplication  algorithm 
with  roundoff  for  the  binary  notation  system. 

Actually,  in  the  described  technique  the  AU  summator  serves  for 
the  storage  of  the  sums  of  the  partial  products  of  the  multiplicand 
by  the  individual  digits  of  the  multiplier.  This  storage  is  accom¬ 
plished  with  an  accuracy  to  the  lower  digits  which  are  dropped  in  the 
right  shift  of  the  summator  contents.  Each  time  the  multiplicand  is 
added  or  not  added  to  the  partial  sum  stored  in  the  summator,  depend¬ 
ing  on  whether  the  lowest  digit  of  the  right-shifted  multiplier  is 
equal  to  one  or  zero.  It  is  clear  that  this  procedure  together  with 
the  preliminary  shifts  on  the  summator  and  the  P2  register  denotes 
the  addition  of  the  product  of  the  multiplicand  by  the  next  (right) 
digit  of  the  multiplier  in  the  previous  sum  of  the  analogous  (partial) 
products,  as  is  required  in  the  conventional  multiplication  algorithm. 
Roundoff  to  the  number  of  significant  digits  contained  in  the  cofac¬ 
tors  is  accomplished  as  a  result  of  the  shifts  on  the  summator  which 
lead  to  the  elimination  of  the  lowest  digits  of  the  product.  In  the 
case  of  multiplication  not  only  by  positive,  but  also  by  negative  num¬ 
bers  an  additional  microoperation  for  the  formation  of  the  sign  of  the 
product  is  introduced  into  the  microprogram. 

Mxcroprogram  for  Conditional  Transfer  on  Inequality 

1.  Clear  AU  and  MU  argisters. 

2.  Transfer  code  from  CU  first  address  register  into  MU  address 
register. 

3.  MU  read- in. 
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4.  Transfer  code  from  MU  number  register  into  AU  summator. 

5.  Clear  MU  registers. 

6.  Transfer  code  from  CU  second  address  register  into  MU  address 
register. 

7.  MU  read-in. 

8.  Transfer  code  from  MU  number  register  into  AU  register. 

9.  Transfer  code  with  sign  change  from  register  into  summator. 

10.  If  the  summator  content  s  >  0,  then  add  one  to  content  of  com¬ 
mand  counter,  if  s  <  0  then  transfer  code  from  CU  third  address  reg¬ 
ister  into  command  counter. 

11.  Clear  MU  registers. 

12.  Transfer  code  from  command  counter  into  MU  address  register. 

13.  MU  read-in. 

14.  Clear  command  register. 

15.  Transfer  code  x.rom  MU  number  register  into  command  register. 

This  microprogram  accomplishes  the  comparison  of  the  number  A^ 

written  from  the  first  address  of  the  command  with  the  number  A^  writ¬ 
ten  from  its  second  address.  If  it  is  found  that  A^  >  A 2  control  is 
transferred  to  the  next  command  in  order.  If,  however,  A1  <  A2  the 
following  command  is  taken  from  the  third  address  of  the  current  com¬ 
mand.  It  is  not  difficult  to  see  that  from  the  two  operations  of  con¬ 
ditional  transfer  with  resprct  to  Inequality,  by  switching  the  places 
of  the  numbers  A^  and  A2  we  can  form  the  operation  of  conditional 
transfer  with  respect  to  the  exact  coincidence  of  the  words  described 
in  the  preceding  section. 

The  described  basic  principles  of  the  organization  of  the  algor¬ 
ithmic  process  in  the  universal  electronic  digital  machines  gives  a 
general  idea  of  the  so-called  block  structure  of  such  maciines.  In  the 


real  design  of  the  electric  programming  automata  the  stage  of  the 


block  synthesis  Is  only  the  starting  point  for  the  development  of 
particular  circuit  solutions.  The  selection  of  these  solutions  is  made 
on  the  basis  of  the  theory  of  automata  and  the  theory  of  combination 
circuits  presented  in  Chapters  1  and  3* 

§3.  THE  CONCEPT  OP  PROGRAMMING 

Programming  is  the  writing  of  a  particular  algorithm  in  the  form 
of  a  finite  sequence  of  instructions  (commands)  for  the  universal  pro¬ 
gram  automaton.  Such  a  sequence  is  termed  the  operating  program  of  the 
given  automaton.  Entered  into  the  automaton  together  with  the  initial 
data  (input  word  of  the  algorithm),  it  forces  the  automaton  to  perform 
the  operation  of  the  algorithm  in  question,  i.e.  to  transform  the 
given  input  word  (input  information)  into  the  corresponding  output 
word  (output  information).  The  universality  of  the  set  of  operations 
of  the  modem  program  automata  provides  the  possibility  of  program¬ 
ming  any  algorithm  with  the  condition  that  we  neglect  the  limitations 
imposed  by  the  finiteness  of  the  volume  of  the  automaton  memory. 

In  order  to  understand  the  essence  of  the  problems  posed  In  pro¬ 
gramming,  let  us  consider  first  some  simple  particular  example.  Let  us 

"  1 

assume  that  we  are  required  to  calculate  a  sum  of  the  form  Jj]?, 

kml 

where  m  Is  some  fixed  natural  number.  We  shall  compile  the  program  for 
the  solution  of  this  problem  for  a  three-address  universal  digital  ma¬ 
chine  with  natural  order  of  succession  of  commands,  whose  set  of  ope¬ 
rations  Includes  all  four  arithmetic  operations  (addition,  subtraction 
multiplication  and  division),  the  operation  of  conditional  transfer 
with  respect  to  exact  word  coincidence  and  the  operation  of  machine 
stop. 

Let  us  assume  for  simplicity  that  there  is  no  need  for  input  and 

output  of  Information  in  the  machine  MU.  In  other  words,  the  initial 

information  and  the  program  which  will  be  compiled  are  assumed  to 
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have  been  entered  in  the  proper  memory  cells  and  the  problem  solution 
is  obtained  in  the  MU  cells  (in  a  single  cell  in  the  considered  case) 
assigned  for  this  purpose  after  stopping  the  machine. 

For  the  solution  of  the  posed  problem  we  introduce  the  following 
notations : 

xk  —  k\ 

y,  -  *2  - 

,  _!_!• 

*  y,  **  ' 

S*  =  2i  4-  *«  4-  •  •  •  +  2*  =*  Sk- 1  x  z* 

We  shall  assume  that  for  the  storage  of  the  quantities  xk,  yk,  zk  and 
s^  there  are  assigned  some  four  memory  cells  termed  the  working  cells. 
It  is  natural  to  break  the  process  of  the  computation  of  the  desired 
sum  sk  down  into  m  completely  identical  steps,  in  each  of  which  we 
compute  the  corresponding  value  of  the  partial  sum  sk  (k  =  1,  2,  . .  .  , 
...,  m).  The  computations  are  initiated  with  the  value  k  =  1  and  sQ 

and  can  be  written  in  the  form  of  the  following  scheme: 

i)  =  4 

3)  s„  =  s*_,  -f  zk, 

4)  Jffc+i  —  **  +  '* 

5)  if  xk+1  =  m  +  1  then  go  to  the  following  instruction;  if 
xk+l  ^  m  +  1  then  return  to  instruction  1; 

6)  stop. 

This  scheme  is  actually  already  the  desired  program,  in  whose  in¬ 
structions  the  actual  addresses  of  the  working  cells  are  replaced  by 
the  symbolic  designations  xk,  yk,  zk,  sk  termed  symbolic  addresses.  By 
fixing  the  actual  addresses  of  these  cells,  and  also  of  the  cells  con¬ 
taining  the  constants  1  and  m  +  1  which  figure  in  the  program  in  ques¬ 
tion,  it  is  not  difficult  to  go  from  the  program  with  symbolic  ad¬ 
dresses  to  the  actual  program  in  the  three-address  instructions.  In 
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this  case  we  shall  assume  that  the  conditional  transfer  instruction 
provides  for  natural  succession  of  instructions  with  coincidence  of 
the  compared  words  (located  in  the  first  and  second  addresses  of  the 
conditional  transfer  instruction)  and  transfers  control  to  the  third 
address  with  noncoincidence  of  the  compared  words. 

Let  the  addresses  of  the  working  cells  x^.,  yk,  and  s^.  be  re¬ 
spectively  1,  2,  3  and  4;  let  the  addresses  of  the  cells  containing 
the  constants  1  and  m  +  1  be  5  and  6;  and  let  the  address  of  the  cell 
containing  the  first  program  instruction  be  7.  In  this  case  the  de¬ 
sired  program  (in  actural  addresses)  is  written  in  the  form  of  the 
follwoing  sequence  of  instructions: 

1)  multiplication  1,  1,  2; 

2)  division  5,  2,  3; 

3)  addition  4,  3,  4; 

4)  addition  1,  5,  1; 

5)  conditional  transfer  1,  6,  7; 

6)  stop. 

In  the  cell  with  the  address  1  there  must  initially  be  placed  a 
number  equal  to  one,  and  in  the  cell  with  the  address  4  there  must  be 
a  number  equal  to  zero.  The  initial  filling  of  the  working  cells  with 
the  addresses  2  and  3  is  evidently  unimportant,  since  the  required 
filling  of  these  cells  is  performed  by  the  first  and  second  instruc¬ 
tions  of  the  program.  We  recall  that  the  machine  memory  is  presumed 
to  be  constructed  so  that  with  the  writing  or  any  word  in  any  cell  the 
previous  content  of  this  cell  is  erased.  After  stopping  of  the  machine 
(performed  by  the  sixth  instruction)  the  sought  value  of  the  sum  sm  is 
obtained  in  the  fourth  memory  cell. 

It  is  not  difficult  to  see  that  in  this  program  the  cell  with  the 
address  2  can  be  used  not  only  for  storing  the  quantity  y  but  also 

K 
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for  the  storage  of  the  quantity  z^.  Actually,  the  quantity  y^  is  used 
exclusively  only  for  the  calculation  of  the  value  of  the  quantity  z^, 
therefore  after  calculating  z^  the  value  found  can  be  sent  to  the  cell 
where  the  quantity  y^  was  previously  stored  without  danger  of  interfer¬ 
ing  with  the  correctness  of  the  following  calculations.  There  exist 
general  techniques  which  permit  automating  such  a  process  of  econo¬ 
mizing  of  the  working  cells. 

As  the  second  example  let  us  consider  the  programming  of  the  pro¬ 
blem  of  the  calculation  of  the  scalar  product  of  two  n-dimensional 
vectors  A  =  (a,,  a 2,  ...,  an)  and  B  =  (b^  b2,  ...,  bn)  i.e.,  the  com¬ 
putation  of  the  sum  b  of  the  paired  products  of  their  corresponding 
components:  s  =  a-^b^  +  a2b2  +  •••  +  anbn*  ^et  us  assiune  the  com¬ 

ponents  of  the  vector  A  are  arranged  in  the  cells  with  the  addressee 
a  +  1,  a  +  2,  . . . ,  a  +  n,  and  the  components  of  the  vector  B  are  in 
the  cells  with  the  addresses  b  +  1,  b  +  2,  . ..,  b  +  n,  where  a  and  b 
are  certain  fixed  natural  numbers,  chosen  so  that  the  arrays  of  the 
cells  assigned  for  the  storage  of  the  components  of  the  vectors  A  and 
B  are  disjoint. 

For  the  sake  of  variety  let  us  compile  the  program  for  the  cal¬ 
culation  of  the  scalar  product  with  application  to  a  single-address 
machine  with  natural  order  of  succession  of  commands.  In  this  case  the 
instructions  which  realize  the  arithmetic  operations  are  understood  so 
that  the  corresponding  operation  is  performed  with  a  pair  of  numbers, 
the  first  of  which  is  in  the  AU  summator,  and  the  second  is  in  the  cell 
whose  address  is  indicated  in  the  instruction.  The  result  of  the  ope¬ 
ration  remains  in  the  summator.  For  the  performance  of  the  arithmetic 
operations  with  such  instructions  it  is  also  necessary  to  have  instruc¬ 
tions  which  accomplish  the  exchange  of  commands  between  the  AU  sum¬ 
mator  and  the  MU  cell  whose  address  is  indicated  in  the  corresponding 
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Instruction. 

The  conditional  transfer  command  can  be  performed  in  various  ways. 
Let  us  assume  that  there  takes  place  a  conditional  transfer  for  ^ero 
in  the  summator:  if  in  the  AU  summator  with  the  performance  of  a  com¬ 
mand  for  conditional  transfer  there  appears  written  the  number  0,  then 
control  is  transferred  to  the  next  command  in  order,  otherwise  the  se¬ 
lection  of  the  following  command  is  made  from  the  address  indicated  in 
the  conditional  transfer  command.  It  is  assumed  for  simplicity  that 
the  it/h  command  of  the  program  will  be  stored  in  the  cell  with  the  ad¬ 
dress  J.  (i  =  1,  2,  ...).  Assigning  for  the  storage  of  the  readdress 
constant  (number  equal  to  one)  the  cell  with  the  address  c ,  for  the 
storage  of  the  number  n  (dimension  of  the  vectors  A  and  B  —  the  cell 
with  the  address  a,  and  for  the  storage  of  the  partial  sum  s^  =  a.^  + 
+  a2b2  +  ...  +  ajc  —  the  cell  with  the  address  j3,  we  arrive  at  the 
following  program: 

1)  transfer  to  summator  from  MU,  a  +  1; 

2)  multiply,  b  +  1; 

3)  add,  s; 

4)  transfer  from  summator  to  MU,  s; 

5)  transfer  to  summator,  lj 

6)  add,  c j 

7)  transfer  from  summator  to  MU,  1; 

8)  transfer  to  summator,  2; 

9)  add,  c; 

10)  transfer  from  summator  to  MU,  2; 

11)  transfer  to  summator,  t; 

12)  add,  _cj 

13)  transfer  from  summator  to  MIT,  t; 

14)  transfer  to  summator,  a; 
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15)  subtract,  t; 

16)  conditional  zero  transfer  to  summator,  1; 

17)  stop. 

The  cell  with  the  address  t  which  appears  in  the  constructed  pro¬ 
gram  is  the  so-called  program  counter  of  the  number  of  cycles.  In¬ 
structions  11,  12  and  13  perform  the  increase  of  the  content  of  this 
cell  by  one.  If  in  the  beginning  of  the  computation  there  was  a  number 
equal  to  zero  in  the  cell  t,  after  the  performance  of  n  cycles  (repe¬ 
titions  of  the  group  of  instructions  1-13)  there  will  be  a  number 
stored  in  this  cell  equal  to  n.  Then  as  a  result  of  the  subtraction 
performed  by  instructions  14  and  15*  zero  will  be  obtained  in  the  sum¬ 
mator  and  the  subsequent  conditional  transfer  command  will  transfer 
control  to  command  17,  which  stops  the  machine. 

Up  to  this  moment  the  conditional  transfer  command  will  transfer 
control  to  the  first  command,  as  a  result  of  which  the  computation  cy¬ 
cle  will  be  repeated.  However,  this  will  not  be  a  literal  repetition, 
since  with  the  aid  of  commands  5,  6,  7  and  8,  9,  10  there  is  accom¬ 
plished  an  increase  of  the  address  of  the  first  and  second  commands  of 
the  program  by  unity.  Therefore  commands  1  and  2  will  lead  to  the  for¬ 
mation  of  the  product  of  a  new  pair  of  components  a^b^  of  the  vectors 
A  and  B,  commands  3  and  4  will  lead  to  the  computation  and  the  stor¬ 
age  in  the  cell  £  of  the  new  value  of  the  partial  sum  s^  =  s^_^  +  a^b^. 

For  a  proper  understanding  of  the  described  program  it  is  neces¬ 
sary  to  note  that  the  operation  of  transfer  into  the  summator  of  any 
code  assumes  the  preliminary  clearing  of  the  summator  and  after  the 
transfer  of  the  code  from  the  summator  to  the  MU  the  summator  is  also 
automatically  cleared.  In  addition,  it  is  necessary  that  in  the  begin¬ 
ning  of  the  operation  there  be  a  number  equal  to  zero  written  in  the 
cell  t. 
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The  described  method  for  performing  the  address  substitution 
(change  of  the  command  addresses)  is  not  convenient  when  the  codes  of 
the  commands  themselves  are  subject  to  change.  First,  it  leads  to  ex¬ 
tension  of  the  program  (particularly  noticeable  in  the  case  of  the  s 
single-address  machines),  and  second,  and  this  is  most  Important,  as 
a  result  of  its  use  the  initially  given  program  is  altered  and  is  not 
suitable  for  further  use  without  preliminary  restoration  of  the  orig¬ 
inal  values  of  the  address  portions  of  the  commands.  This  restoration 
introduces  further  complications  in  the  program  and  requires  additional 
MU  cells  for  the  storage  of  the  original  addresses.  Therefore  for  the 
majority  of  the  modern  universal  digital  machines  we  prefer  another 
method  of  readdressing,  associated  with  the  use  of  the  so-called  ad¬ 
dress  modification  registers,  or  index-registers . 

The  index  registers  are  a  part  of  the  control  unit  of  the  uni¬ 
versal  program  automaton  and  have  the  property  that  in  the  process  of 
the  performance  of  any  command  the  contents  of  a  particular  one  of 
them  is  automatically  added  to  those  command  addresses  which  are 
equipped  with  a  special  label  corresponding  to  this  index  register. 

With  the  use  of  a  single  index  register  I  in  the  three-address 
command  system,  the  program  for  the  computation  of  the  scalar  product 
of  the  vectors  can  be  written  with  only  five  commands  in  all  (designa¬ 
tions  of  the  cell  addresses  are  the  same  as  in  the  preceding  program): 

1)  add  1  to  the  content  of  the  index  register  I; 

2)  multiply,  a(l),  b(l),  £,* 

3)  add,  £  a  s; 

4)  conditional  transfer,  I,  a,  1; 

5)  stop. 

In  this  program  use  is  made  of  conditional  transfer  on  exact  co¬ 
incidence  of  codes  in  the  index  register  I  and  in  the  cell  a  (where 
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the  dimension  n  of  the  vectors  A  and  B  is  written).  In  the  case  of  co¬ 
incidence  of  these  codes  control  is  transferred  to  the  sequentially 
following  (fifth)  command,  and  in  the  case  of  noncoincidence  control 
is  transferred  to  the  first  command  of  the  program. 

In  the  course  of  the  entire  time  of  operation  the  program  retains 
its  initial  form,  only  the  content  of  the  index  register  I  changes.  In 
case  of  necessity,  there  may  be  included  in  the  program  a  special  com¬ 
mand  for  the  clearing  of  the  index  register,  setting  it  to  zero. 

In  the  programming  of  more  complex  algorithms,  for  example  the 
algorithm  for  the  multiplication  of  a  vector  by  a  matrix,  the  need 
arises  to  use  several  index  registers  for  the  storage  of  the  readdress¬ 
ing  constants  which  are  changed  by  the  various  steps.  Let  us  consider 
as  an  example  the  multiplication  of  the  n-dimensional  vector  B  =  (b  , 
b2,  . ..,  bn)  by  the  matrix  A  of  nth  order  with  the  elements  alk(l  < 

<  i  <  n,  1  <  k  <  n). 

Let  us  assume  that  the  sequential  components  of  the  vector  B  are 
located  in  the  memory  cells  with  the  addresses  b+1,  b+2,  ...,  b+n 
and  the  elements  aik  of  the  matrix  A  are  in  the  memory  cells  with  the 
addresses  a  +  (k  —  l)  n  +  i(i,  k  =  1,  2,  ...,  n).  The  components  of  the 
vector  C  =  BA  are  located  in  the  cells  with  the  addresses  c  +  1,  c  +  2, 
...,  c  +  n  (there  were  initially  numbers  equal  to  zero  in  these  cells). 
We  use  the  cell  with  the  address  t  as  the  working  cell  for  storage  of 
the  intermediate  results  (its  initial  content  Is  not  important  to  us). 
Finally,  the  cells  with  the  addresses  1,  2,  ...  are  used  for  the  stor¬ 
age  of  the  sequential  instructions  which  constitute  the  sought  program. 

Let  us  introduce  the  three  Index  registers  1^,  I2,  I^,  which  must 
be  cleared  prior  to  initiation  of  operation,  and  let  us  place  in  the 
cell  with  the  address  d  the  number  n,  equal  to  the  dimension  of  the 
vector  B  and  the  order  of  the  matrix  A.  With  the  aid  of  the  introduced 
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notations  the  desired  program  for  the  multiplication  of  the  vector  by 
the  matrix  is  written  in  the  following  form: 

1)  add  1  to  the  content  of  the  index  register  I^; 

2)  add  1  to  the  content  of  the  index  register  1^; 

3)  add  1  to  the  content  of  the  index  register  I 2; 

4)  multiply,  bfl^,  a(l2),  t; 

5)  add,  c  (I3),  t,  c(l3); 

6)  conditional  transfer,  I3,  d,  2; 

7)  clear  index  register,  1^; 

8)  conditional  transfer,  I3,  d,  1; 

9)  stop. 

With  further  complication  of  the  algorithms  the  difficulties  of 
the  programming  increase  more  and  more.  In  this  connection  there  nat¬ 
urally  arises  the  thought  of  looking  for  more  economical  methods  of 
writing  the  information  on  the  algorithm  and  the  application  of  the 
most  universal  program  automaton  for  the  automatic  translation  of  such 
forms  into  the  actual  operational  programs.  This  idea  constitutes  the 
basis  for  automatic  programming  with  the  aid  of  the  so-called  univer¬ 
sal  programming  programs  (translators). 

The  universal  programming  program  is  an  algorithm  programmed  for 
a  particular  universal  digital  machine  for  the  translation  of  the 
statement  of  any  algorithm  in  a  particular  formal  algorithmic  language 
into  the  instruction  language  of  the  given  machine.  As  the  formal  al¬ 
gorithmic  language  in  question  here  we  can,  of  course,  select  any  of 
the  languages  described  in  Chapter  1,  for  example  the  language  of  the 
noxmal  algorithm  schemes.  However  such  a  choice  would  not  facilitate, 
but  rather  would  complicate  the  solution  of  the  programming  program, 
since  the  statement  of  the  algorithm  in  any  of  the  abstract  algorith¬ 
mic  schemes  of  the  first  chapter  is,  as  a  rule,  a  considerably  more 
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difficult  problem  than  programming  in  the  language  of  the  instructions 
of  the  modern  universal  digital  machines.  To  convince  ourselves  of 
this  it  is  sufficient  to  consider  the  case  of  the  addition  operation, 
which  is  performed  in  the  universal  digital  machines  with  the  aid  of 
one  instruction  while,  for  example,  with  the  use  of  the  normal  algo¬ 
rithms  it  is  realized  with  the  aid  of  the  quite  complexly  written 
scheme  which  contains  many  elementary  substitutions. 

Therefore  attempts  have  been  made  to  develop  those  universal  al¬ 
gorithmic  languages  which  would  retain  the  basic  properties  of  the 
language  of  the  modern  universal  digital  machines  but  which  would  per¬ 
mit  a  simpler  and  more  easily  read  statement  of  the  algorithms  en¬ 
countered  in  practice  in  comparison  with  the  direct  programming  in 
the  "machine"  languages.  Among  the  languages  of  this  sort  we  note,  for 
example,  the  Fortran  language  (U.S.),  the  Polish  algorithmic  language 
SAKO,  the  address  language  (Kiev,  USSR),  and  others. 

The  creation  of  practical  algorithmic  languages  is  important  not 
only  because  such  languages  facilitate  programming,  but  also  because  a 
sufficiently  well  developed  practical  algorithmic  language  can  become 
a  generally  accepted  and  generally  understood  language  for  the  writ¬ 
ing  of  various  algorithms.  Thus,  the  ALGOL-60  language  which  was  de¬ 
veloped  by  a  group  of  European  and  American  scientists  has  at  the  pre¬ 
sent  received  wide  international  acceptance.  A  detailed  description  of 
this  language  is  given  in  the  following  chapter,  here  we  shall  consid¬ 
er  certain  techniques  which  facilitate  direct  programming  in  machine 
languages . 

The  first  technique,  already  considered  above,  is  the  use  in  the 
initial  stage  of  the  programming  of  symbolic  addresses  in  place  of  the 
actual  (numerical)  addresses.  The  later  assignment  of  the  actual  val¬ 
ues  to  the  introduced  symbolic  addresses  and  the  economy  of  the  work- 
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ing  cells  are  a  purely  technical  operation  and  are  easily  subjected  to 
automaton.  In  spite  of  its  simplicity,  the  method  of  symbolic  addresses 
permits  significant  simplification  of  the  programming  of  complex  pro¬ 
blems  and,  what  is  most  important,  considerably  reduces  the  number  of 
errors. 

A  second  method  which  can  be  used  to  significantly  simplify  di¬ 
rect  programming  is  the  inclusion  of  previously  constructed  simpler 
programs  in  the  more  complex  programs.  The  programs  specially  adapted 
for  inclusion  in  the  more  complex  programs  are  usually  termed  subpro¬ 
grams.  By  accumulating  a  library  of  subprograms,  the  programmer  can  In 
many  cases  reduce  the  direct  programming  to  a  combination  of  a  small 
number  of  avialable  subprograms.  To  facilitate  the  combining  of  sev¬ 
eral  subprograms  into  a  single  program,  there  have  been  worked  out 
special  techniques  which  make  it  possible  to  avoid  introducing  changes 
in  the  subprograms  when  including  them  in  quite  diverse  programs.  The 
difficulty  lies  in  the  fact  that  the  last  instruction  of  the  subpro¬ 
gram  must  transfer  control  to  some  instruction  of  the  basic  program, 
which  changes  from  program  to  program  and  is  not  known  to  the  compiler 
of  the  subprogram. 

We  can  overcome  this  difficulty  by  sending  at  the  moment  of  trans¬ 
fer  to  the  subprogram  the  address  of  the  instruction  to  which  the  ma¬ 
chine  must  go  after  completion  of  the  subprogram  into  a  special  mem¬ 
ory  cell  which  is  termed  the  return  register.  In  this  case  the  subpro¬ 
gram  must  be  terminated  by  a  special  instruction  "transfer  on  the  re¬ 
turn  register, "  which  extracts  the  next  command  from  the  cell  whose 
address  is  stored  in  this  register.  We  note  that  for  the  use  of  any 
given  cell  of  the  machine  operational  memory  as  the  return  register  it 
is  advisable  to  introduce  referral  to  the  memory  using  the  so-called 
second  rank  address.  With  this  referral  the  selection  from  the  memory 

-  378  - 


» 


-**r 


or  the  writing  into  the  memory  are  accomplished  not  by  the  addresses 
indicated  in  the  command  being  executed,  but  by  the  addresses  which 
are  stored  in  the  memory  cells  whose  addresses  are  indicated  in  this 
command.  With  the  use  of  subprograms  which  are  included  in  other  sub¬ 
programs  (and  not  in  the  basic  program)  we  must  make  use  of  several 
return  registers  or  second  rank  address  transfers. 

Such  use  of  some  subprograms  within  others  creates  the  basis  for 
the  multistage  organization  of  systems  of  standard  programs.  The  ra¬ 
tionality  of  such  organization  is  determined  by  the  degree  of  economy 
of  the  arrangement  of  the  library  of  standard  programs  in  the  partic¬ 
ular  memory  devices.  This  question  becomes  particularly  important  with 
the  scheme  of  realization  of  various  sorts  of  standard  subprograms 
which  enrich  the  set  of  operations  performed  by  the  machine.  In  this 
case  there  is  achieved  a  major  economy  of  the  work  of  the  programmers, 
who  find  it  possible  to  use  a  large  subprograms,  assigning  to  each  of 
them  only  a  single  machine  instruction.  Such  multistage  organization 
of  the  control  is  realized  in  the  "Promin,n  computer  of  the  Institute 
of  Cybernetics  of  the  Academy  of  Sciences  of  the  Ukrainian  SSR  (Kiev). 

Moreover,  even  In  the  absence  of  the  schematic  realization  a  suf¬ 
ficiently  extensive  library  of  standard  subprograms  significantly  fa¬ 
cilitates  the  programming,  since  a  considerable  portion  of  the  new 
programs  being  compiled  will,  as  a  rule,  be  made  up  of  previously  pro¬ 
grammed  standard  portions  available  In  the  library. 

We  note  that  in  compiling  a  library  of  standard  subprograms  an 
attempt  is  made  to  provide  a  quite  high  degree  of  generality  of  the 
problems  being  solved.  For  example,  the  standard  subprogram  for  the 
multiplication  of  matrices  is  compiled  for  the  multiplication  of  ma¬ 
trices  of  any  order  rather  thr.n  for  only  some  one  order.  Other  subpro¬ 
grams  are  constructed  similarly. 


Also  of  significant  assiL'  mce  in  direct  programming  is  the  pre¬ 
liminary  writing  of  the  prograr  .  r  simplified  form,  usually  termed  the 
program  block  diagram,  with  subsequent  programming  of  each  individual 
block.  A  convenient  method  for  writing  program  block  diagrams  is  the 
operator  method  of  writing  program  diagrams  proposed  by  Iyapunov. 

In  the  use  of  this  method  groups  of  program  commands  of  a  single 
type  which  follow  one  another  (for  example,  commands  which  realize  the 
arithmetic  operations)  are  combined  into  the  so-called  operators.  The 
most  widely  used  are  the  arithmetic  operators  and  the  readdressing 
operators  (which  change  the  content  of  the  index  registers).  We  label 
the  arithmetic  operators  with  the  letter  A,  the  logical  operators  with 
the  letter  P,  the  readdressing  operators  with  the  letter  I,  and  the 
stop  operator  with  the  letter  P.  In  addition,  the  operators  are  num¬ 
bered  with  the  use  of  special  indices  in  the  order  in  which  they  occur 
in  the  program. 

Combining  a  group  of  arithmetic  operations,  every  arithmetic  ope¬ 
rator  is  a  coded  designation  for  the  operation  of  computation  using  a 
particular,  frequently  quite  complex,  formula.  The  logical  operator 
makes  a  verification  of  the  logical  conditions  on  the  basis  of  which 
particular  conditional  transfers  are  performed  (control  transfers  in 
the  program  which  violate  the  natural  order  of  succession  of  commands). 
A  vertical  bar  is  placed  after  the  logical  operator;  above  and  below 
this  bar  there  are  indicated  the  numbers  of  the  operators  to  which 
control  is  transferred  in  the  case  when  the  logical  condition  is  sat¬ 
isfied,  and  correspondingly  in  the  case  when  it  is  not  satisfied.  Ab¬ 
sence  of  a  number  below  or  above  the  bar  indicates  that  in  the  corre¬ 
sponding  case  conteol  is  transferred  to  the  operator  standing  directly 
to  the  right  of  the  bar.. 

With  the  aid  of  the  described  symbolism  the  operator  diagram  of 
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the  last  of  the  programs  which  we  considered  can  be  written  as 

Here  the  readdress  operator  1^  corresponds  to  the  first  instruction  of 
the  program,  and  the  operator  I2  corresponds  to  the  two  following  in¬ 
structions.  The  arithmetic  operator  combines  the  fourth  and  fifth 
instructions,  and  the  remaining  operators,  include  one  instruction  each. 

To  facilitate  reading  of  the  operator  diagrams  the  bars  which  des- 
i  gn  ate  the  conditional  transfers  can  be  supplied  with  horizontal 
lines  above  and  below,  directed  to  the  left.  In  this  case  there  is 
also  placed  ahead  of  the  operator  to  which  control  is  transferred  a 
bar  with  a  line  directed  to  the  right  and  labelled  with  the  same  num¬ 
ber  as  the  corresponding  bar  of  the  conditional  transfer  operator. 

Then  a  group  of  operators  which  composes  a  cycle  which  is  repeated 
several  times  as  a  result  of  the  conditional  transfers  is  framed  on 
both  sides  by  sort  of  "brackets”  which  facilitate  the  search  for  such 
cycles. 

Using  these  notations,  the  operator  diagram  described  above  can 
be  rewritten 

L/|  L  l*Pt  J  Fy. 
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By  supplying  the  operator  diagram  of  the  program  with  the  de¬ 
scription  of  each  of  the  operators  occurring  in  it  (other  than  the 
stop  operator)  we  can  after  the  compilation  of  such  a  diagram  turn  to 
the  individual,  sequential  programming  of  these  operators  with  subse¬ 
quent  combining  of  the  individual  pices  of  the  program  thus  compiled 
into  a  single  whole,  These  operations  are  to  a  considerable  degree 
routine  work  and  can  be  relatively  easily  automated  with  the  aid  of 
any  universal  program  automaton. 

We  note  that  for  the  description  of  the  arithmetic  and  logical 
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operators  we  can  make  use  of  the  conventional  arithmetic  or  logical 
formulas.  Having  available  the  special  programs  for  the  automatic 
translation  of  such  formulas  into  the  machine  instruction  language  and 
combining  them  with  standard  library  subprograms,  we  find  the  possi¬ 
bility  of  presenting  to  the  machine  (universal  program  automaton)  the 
task  in  the  same  from  in  which  it  is  presented  to  the  skilled  human 
computer.  This  method  actually  combines  the  method  of  the  standard  sub¬ 
programs  with  the  method  using  (to  a  certain  extent)  the  universal  pro¬ 
gramming  programs.  Therefore  it  is  natural  to  term  it  the  specialized 
programming  program  method  or  the  programming  program  library  method 
[24].  Here  the  specialization  consists  in  the  fact  that  the  corre¬ 
sponding  library  is  oriented  to  a  certain  class  of  typical  problems, 
permitting  actually  the  complete  elimination  of  programming  and  limit¬ 
ing  oneself  to  communicating  to  the  machine  only  the  conditions  of  the 
problem  which  must  be  solved. 

§4.  THE  UNIVERSAL  ALGORITHMIC  LANGUAGE  ALGOL-60 

The  International  algorithmic  language  ALGOL-60,  which  for  brevity 
we  shall  term  simply  ALGOL,  is  a  means  for  the  quite  simple,  precise 
and  clear  writing  of  computational  algorithms.  Being  a  universal  algo¬ 
rithmic  language.  It  is  suitable,  of  course,  for  the  writing  of  any 
(not  necessarily  computational)  algorithms,  however  in  the  case  of  the 
processing  of  literal  rather  than  numerical  information  the  simplicity 
and  the  clarity  of  the  corresponding  "algol"  writing  is  to  a  consider¬ 
able  degree  lost.  While  the  programming  of  the  computational  algorithms 
by  ALGOL  Is  a  far  simpler  problem  than  the  direct  programming  for  the 
modern  universal  electronic  digital  machines,  the  programming  of  pro¬ 
blems  on  the  processing  of  literal  information  by  ALGOL  is  only  slight¬ 
ly  simpler  than  using  the  "machine”  languages. 

The  basic  symbols  used  in  the  construction  of  the  ALGOL  language 
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are  the  Latin  letters  (26  capital  and  26  lower  case  letters),  the  Ar¬ 
abic  numerals  (from  zero  to  nine  inclusive),  the  logical  values  "true” 
and  "false,”  and  also  the  operation  symbols,  separator  symbols  and 
brackets  (the  last  three  types  of  symbols  are  termed  limiters).  There 
are  also  a  certain  number  of  service  words,  for  which  words  of  the 
English  language  are  usually  used.  It  is  customary  to  write  these  words 
in  bold  face  type. 

For  the  notations  of  the  numbers  use  is  made  of  the  decimal  nota¬ 
tion  system,  with  the  whole  part  being  separated  from  the  fractional 
part  by  a  point  (and  not  by  a  comma).  The  plus  sign  ahead  of  positive 
numbers  and  the  zero  symbol  in  the  designation  of  the  whole  part  of  a 
proper  fraction  can  be  dropped.  For  the  designation  of  a  decimal  ex¬ 
ponent  (number  of  tens  in  an  integral  power)  we  make  use  of  a  special 
symbol  —  a  ten  dropped  below  the  basic  line  (it  is  usually  printed  in 
bold  face  type).  The  numbers  used  in  ALGOL  are  divided  into  two  types: 
integer  and  real.  The  integer  type  includes  only  the  whole  numbers 
(with  or  without  sign)  which  do  not  contain  in  their  writing  a  symbol 
of  a  decimal  exponent  or  a  decimal  point;  all  the  remaining  numbers  be¬ 
long  to  the  real  type  (here  the  number  3*0  is  real  but  not  an  integer). 

Examples  of  the  integer  type  numbers  are:  0,  +  275>  —  0634,  +  0, 

—  2.  Examples  of  the  real  type  numbers  are:  +  5*340^8  (i.e.,  the 

o 

number  5-34*10  ),  —  .063  (the  number  -  O.O63),  —  .  3710  -  32  (the  num¬ 
ber  -0. 37* 10~^2) ,  +  5  (the  number  10^),  etc. 

The  introduction  of  particular  quantities  in  ALGOL  (in  writing  of 
specific  programs)  is  accompanied  by  their  preliminary  description.  The 
subsequent  concrete  representations  of  these  quantities  must  be  inter¬ 
preted  in  accordance  with  the  indicated  descriptions.  If,  for  example, 
some  quantity  x  was  described  by  the  term  integer,  and  then  its  value 
was  introduced  equal,  say  to  23.4,  then  this  value  must  be  mentally 
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rounded  off  to  the  nearest  integer  (23  in  this  case).  The  value  of  a 
quantity  of  the  integer  type,  equal  to  23.5,  is  rounded  off  to  24,  and 
not  to  23  (to  the  nearest  larger  integer).*  We  note  also  that  if  the 
quantity  x  descrited  as  real,  nothing  prevents  it  taking  also  integral 
values,  however  in  subsequent  operations  with  the  quantity  x  we  pro¬ 
ceed  Just  as  with  any  real  quantity  without  performing  rounding  off 
to  the  nearest  integer. 

For  the  designation  of  various  kinds  of  quantities  (constants  and 
variables)  in  ALGOL  use  is  made  of  the  so-called  identifiers.  Any  fi¬ 
nite  sequence  of  letters  (Latin)  and  decimal  digits,  of  necessity  be¬ 
ginning  with  a  letter  (and  not  with  a  digit),  can  serve  as  an  identi¬ 
fier.  Examples  of  identifiers  might  be  a7L0,  x,  ga,  aPg,  TOWW  etc.  At 
the  same  time  the  expressions  7x,  bab  or  ab  +  x  cannot  serve  as  iden¬ 
tifiers.  The  use  for  designation  of  quantities  not  only  of  the  letters, 
but  also  of  words,  i.e.,  sequences  of  letters  (possibly,  meaningless) 
makes  the  supply  of  identifiers  potentially  unlimited,  which  is  quite 
important  from  the  point  of  view  of  the  possibility  of  the  representa¬ 
tion  of  any  algorithms,  no  matter  how  complex.  The  possibility  of  the 
designation  of  a  quantity  by  its  natural  name  also  presents  obvious 
conveniences,  for  example:  force,  current,  etc.  At  the  same  time  there 
is  one  inconvenience  with  which  we  must  contend  in  the  future:  in  the 
construction  of  arithmetic  expressions  from  the  identifiers  the  multi¬ 
plication  sign  cannot  be  dropped  (as  is  usually  done  in  algebra), 
since  the  expression  ab  +  xy  will  be  understood  in  ALGOL  as  the  sum  of 
two  quantities  designated  by  ab  and  xy  and  not  as  the  sum  of  the  paired 
products  of  the  quantities  a,  b  and  x, 

In  addition  to  the  quantities  which  take  numerical  values  (of  the 
integer  and  real  type)  in  ALGOL  use  is  made  of  Loolean  quantities, 
taking  only  two  values  -  "true"  and  "false."  The  Boolean  quantities 
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are  designated  by  identifiers  in  just  the  same  way  as  the  numerical 
quantities;  in  the  descriptions  they  are  assign*. a  to  the  Boolean  type. 

Similar  quantities,  for  example  the  components  of  any  vector  or 
the  elements  of  a  particular  matrix,  are  usually  denoted  by  identi¬ 
fiers  with  one  or  several  indices.  The  indices  are  written  after  the 
identifier  and  are  enclosed  in  square  brackets.  Different  indices  are 
separated  from  one  another  by  commas.  Whole  numbers  (not  Just  positive 
ones),  variables  and  any  arithmetic  expressions  which  are  always  of 
the  integer  type,  can  be  used  as  indices. 

Examples  of  writing  of  variables  with  indices  are:  A[  1,  -  2],  ps 
[1]  bA8[i,  J,  1], 

Variables  with  indices  which  vary  within  certain  limits  consti¬ 
tute  the  so-called  arrays .  The  array  description  in  ALGOL  is  preceded 
by  the  English  word  array,  before  which  there  is  placed  the  name  of 
the  type  (integer,  real.  Boolean)  of  the  variables  composing  the  array 
(if  the  name  of  the  type  of  variables  in  the  array  is  not  indicated, 
it  is  considered  that  they  are  of  the  real  type).  In  the  description 
of  the  array,  after  the  array  identifier  in  index  (square)  brackets 
there  is  written  the  so-called  list  of  bound  pairs.  Each  bound  pair 
consists  of  two  arithmetic  expressions  (or  numbers)  separated  by  a 
colon.  The  first  of  these  expressions  is  the  lower  bound  (smallest 
possible  value)  of  the  corresponding  index,  and  the  second  is  its  up¬ 
per  bound  (highest  possible  value  of  the  index).  It  is  assumed  that 
the  index  can  run  through  all  the  integral  values  included  between  the 
lower  and  upper  bounds,  and  in  the  case  when  the  upper  bound  is  less 
than  the  lower,  the  corresponding  array  is  considered  indeterminate. 

Examples  of  the  description  of  arrays:  real  array  x|l:n,  0:m),  Boolean 
array  *410:51,  integer  array  A^|  —  7:1.  /:/,  3:31. 

The  number  of  different  indices  characterizing  an  array  is  termed 
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the  dimension  of  this  array.  In  the  examples  just  presented  the  array 
x  has  a  dimension  of  2,  the  array  g^  has  a  dimension  of  1.  As  for  the 
array  N,  formally  Its  dimension  Is  equal  to  3 »  however,  since  the  last 
Index  can  take  only  one  single  value  (equal  to  3)  actually  the  value  of 
the  dimension  of  this  array  reduces  to  2. 

We  note  that  in  the  descriptions  of  the  variables  or  arrays  of 
the  same  type  the  name  of  the  type  can  be  written  only  once,  and  the 
corresponding  indetlflers  are  separated  from  one  another  by  commas.  In 
the  case  of  arrays  with  the  same  bounds  the  index  brackets  with  the 
corresponding  list  of  bound  pairs  can  be  written  out  only  once  -  after 
the  last  identifier  of  an  array  with  these  bounds.  For  example,  real 

a,  bx7  or  integer  array  A,R>Dd  1:2,  l:Jkl  •  The  first  description  describes 

three  variables  which  take  real  values,  and  the  second  describes  three 
two-dimensional  arrays  with  the  same  bounds  which  are  composed  of  in¬ 
tegral  quantities. 

For  the  separation  of  the  described  variables  or  arrays  of  dif¬ 
ferent  types  use  is  made  of  a  semicolon.  For  example,  real  x,y:  Boolean 
A,B,C\  array  px[  1:2,  /:*);  Integer  array  5:101,  <?12:4]  • 

Arbitrary  arithmetic  expressions,  which  play  a  large  role  in  the 
construction  of  the  ALOOL  language,  can  serve  as  the  index  bounds  in 
the  arrays.  The  arithmetic  expressions  are  constructed  from  numerals 
and  variables  with  the  aid  of  the  six  arithmetic  operations  -  addition 
(denoted  by  the  +  sign),  subtraction  (denoted  by  the  sign  -),  multi¬ 
plication  (denoted  by  th:  sign  x),  division  (denoted  by  the  slant  line 
symbol  /),  integral  division  (denoted  by  the  sign  — )  and  raising  to  a 
power  (denoted  by  the  symbol  t). 

The  integral  quotient  a-b  is  the  whole  part  (rounded  off  in  the 
direction  of  reducing  the  modulus  to  the  nearest  integer)  of  the  con¬ 
ventional  quotient  a/b.  This  operation  is  applied  only  to  quantities 
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of  the  integer  type,  so  that  the  expression  3*0-5*0  is  to  be  consid¬ 
ered  indeterminate,  (while  the  expression  3-5  is  determinate  and 
equal  to  zero).  The  operation  of  raising  to  a  power  a  t  b  (a  to  the 
power  b)  with  positive  a  is  determinate  for  all  quantities  b  of  the 
real  and  integer  types,  while  for  negative  a  it  is  defined  only  for 
the  cases  when  the  quantity  b  is  of  the  integer  type. 

Other  conditions  being  the  same,  in  the  arithmetic  expression 
there  must  first  be  performed  the  operation  of  raising  to  a  power, 
then  the  operations  of  multiplication  and  division  (conventional  and 
integral),  then  the  operations  of  addition  and  subtraction.  Like  ope¬ 
rations  (multiplication  and  division  or  addition  and  subtraction)  are 
performed  in  the  conventional  order  —  from  left  to  right.  When  it  is 
necessary  to  perform  operations  in  a  different  order  use  is  made  of 
round  brackets.  With  raising  to  a  power  a  t  b,  the  expressions  a  and  b 
must  as  a  rule  be  enclosed  in  brackets.  Exceptions  are  permitted  only 
in  the  case  when  the  corresponding  (not  enclosed  in  brackets)  quantity 
is  an  unsigned  number,  a  variable  (with  or  without  Indices),  or  a 
function  (see  below). 

Examples  of  the  arithmetic  expressions  are  the  expressions  x  t  2 
(equal  to  x2),  3  t  n  f  k  (equal  to  (3n)k)>  abxAB+ff  t ( — <7).  (*7+A9)t(— 2) 
etc. 

Along  with  the  variables  represented  by  the  usual  identifiers  or 
by  array  identifiers,  in  the  construction  of  the  arithmetic  expres¬ 
sions  use  is  also  made  of  the  so-called  functions.  Every  function  in 
ALQOL  is  designated  by  the  assignment  to  the  function  of  an  identifier 
after  which  there  is  placed  in  round  brackets  the  so-called  list  of 
actual  parameters,  i.e.,  in  other  words,  the  arguments  of  this  func¬ 
tion.  The  actual  parameters  can  be  any  expressions  (arithmetic  or  Boo¬ 
lean)  and  also  the  array  identifiers  and  certain  other  forms  of  iden- 
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tifiers  which  are  defined  below.  The  parameters  are  spearated  from  one 
another  either  by  commas  or  by  a  line  of  the  form:)  any  commentary: 

(.  A  commentary  is  the  name  given  to  a  clarification  of  the  meaning 
of  the  actual  parameters,  which  we  shall  usually  give  in  the  Russian 
language.  In  the  translation  from  the  ALGOL  language  to  the  language 
of  a  particular  computer  this  commentary  is  simply  discarded. 

Examples  of  the  functions  might  be  f(x),  (l(x,  y+a)  force: 

(p)  acceleration:  (a).  The  first  of  these  functions  is  a  single-place 
function  (i.e.,  it  depends  on  one  actual  parameter),  the  second  func¬ 
tion  is  two-place,  and  the  third  is  three-place  (it  depends  on  the 
actual  parameters  k  t  1.5,  jg  and  a). 

In  the  descriptions  the  functions  are  usually  termed  procedures 
with  an  indication  of  the  type  of  quantity  it  defines  (integer,  real 
or  Boolean).  For  such  functions  as  sine,  logarithm  and  others,  we  re¬ 
tain  the  commonly  accepted  identifiers  sin,  In,  etc.  We  establish  the 
identifier  sqrt  for  the  designation  of  the  square  root,  and  the  iden¬ 
tifier  abs  for  the  designation  of  the  absolute  magnitude. 

The  descriptions  of  the  functions  include  in  themselves  headings 
of  the  form  real  procedure  sin  (x),  integer  procedure  abs(n)\  Boolean  procedure  A(a,b).  In 
the  descriptions  use  i3  made  of  the  so-called  formal  parameters  as 
the  function  arguments,  i.e.,  certain  identifiers  which  in  the  subse¬ 
quent  use  of  the  function  can  be  replaced  by  any  actual  parameters, 
i.e.,  variables,  expressions  (arithmetic,  Boolean,  designational) , 
identifiers  of  arrays  of  procedures  or  swithces,  and  also  the  so- 
called  lines. 

The  expressions  constructed  from  numbers,  variables  of  the  real 
and  integer  (with  or  without  indices)  and  functions  (integral  or  real) 
with  the  aid  of  the  arithmetic  operations  are  termed  simple  arithmetic 
expressions.  In  order  to  construct  more  complex  arithmetic  expressions 
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it  is  necessary  to  become  acquainted  with  the  so-called  Boolean  ex¬ 
pressions. 

Boolean  expressions  are  constructed  from  the  logical  values 
("true"  and  "false"),  variables  and  functions  (procedures)  of  the  Boo¬ 
lean  type  and  the  so-called  relations,  which  are  two  arithmetic  ex¬ 
pressions  A  and  B  connected  with  one  another  by  the  equality  or  in- 

•  • 

equality  signs:  A  =  B,  A  ^  B,  A  >  B,  A  >  E,  A  <  B,  A  <  B.  As  the  ope¬ 
rations  for  the  construction  of  the  Boolean  expressions  use  is  made 
of  the  logical  operations  described  in  Chapter  2:  equivalence  (=), 
implication  O),  disjunction  (v),  conjunction  (a)  and  negation  (“j  ). 
The  proirity  of  the  logical  operations  with  respect  to  one  another 
and  the  method  of  use  of  the  brackets  (only  the  round  in  the  present 
case)  are  retained  the  same  as  in  Chapter  2.  We  only  need  add  that 
the  arithmetic  operations  (expressions)  are  considered  to  take  pre¬ 
cedence  over  all  the  relation  operations,  and  the  latter  have  preced¬ 
ence  over  all  the  logical  operations,  so  that  the  expression  a+b>cx 
xd^x AvV*  must  be  understood  as  ((a  '+  b)  >  (c  x  d))  D((xa  y)\/z)-  •  The  quan¬ 

tities  a,  b,  c,  d  in  this  expression  are  of  the  real  or  integer  type, 
and  the  quantities  x,  y,  z  are  Boolean. 

All  the  Boolean  expressions  defined  so  far  are  termed  simple. 

From  the  Boolean  expressions  A,  B,  C  of  which  the  first  expression  A 

•  •  •  • 

is  simple,  we  can  compose  a  more  complex  Boolean  expression  by  use  of 
the  service  words  if,  then,  and  else.  The  corresponding  construction 
looks  like: 

if  £  then  «  else  ». 

It  is  assumed  that  the  complex  Boolean  expression  thus  defined 

is  A  if  the  condition  C  is  satisfied  (i.e.,  if  the  Boolean  expression 

•  • 

C  takes  the  value  "true"),  and  is  B  otherwise.  Since  of  the  three  ex- 

•  • 

pressions  A  B,  C  only  the  expression  A  must  be  simple,  the  expres- 

•  *  •  •  • 
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sions  B  and  C  can  in  turn  be  composed  with  the  aid  of  some  condition. 

•  • 

Finally,  in  the  construction  of  the  simple  Boolean  expressions,  along 
with  the  logical  values,  the  variables,  relations  and  functions  it  is 
pemissible  to  use  any  Boolean  expressions  (both  simple  and  complex) 
which  are  enclosed  in  round  brackets. 

Thus,  recursive  constructions  of  any  depth  are  possible  in  the 
determination  of  the  Boolean  expressions.  For  example,  the  list  of 

Boolean  expressions  might  Include  the  expression  If  a then  (if  a  - b  +c 

tty  B  else  C)  cite  If  D 

then  E  else F  <\  G(K  L)  where  the.  lower  case  letters  denote  vari¬ 
ables  of  real  type,  and  the  capital  letters  denote  variables  of  the 
Boolean  type,  where  G(K,  L)  is  a  function  (Boolean  procedure)  of  the 
actual  parameters  K  and  I..  If  all  this  expression  is  enclosed  in 
round  brackets  it  becomes  a  simple  Boolean  expression  and  aB  such  can 
bf,  used  in  further  constructions. 

The  situation  is  completely  analogous  in  the  case  of  the  arith¬ 
metic  expressions:  from  the  two  arithmetic  expressions  A  and  B,  of 

•  • 

which  the  first  is  necessarily  simple,  and  the  Boolean  expression  C 
we  can  construct  a  complex  arithmetic  expression 

IfUthenrHebe®. 


The  value  of  this  expression  is  taken  equal  to  A  if  condition  C 

•  • 

is  satisfied,  and  equal  to  B  otherwise.  Just  as  in  the  case  of  the 
Boolean  expressions,  in  the  construction  of  the  simple  arithmetic  ex¬ 
pressions  it  is  permissible  to  use  not  only  numbers,  variables,  and 
functions,  but  also  any  arithmetic  expressions  (simple  or  complex) 
which  are  enclosed  in  round  brackets.  So,  for  example,  the  expressions 
(if  a  >  b  then  a  1 2  else  a  f  3)  or  (If  a  —  b  then  a  —  b  else  a  +  b)  f  (a  —  b  f  2)  must  be  con¬ 
sidered  simple  arithmetic  expressions. 

All  the  descriptions  presented  so  far  are  in  essence  auxiliary. 
The  basic  means  for  the  construction  of-  the  algorithms  in  ALGOL  are 
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the  so-called  operators.  ALGOL-60  contains  six  different  types  of  ope¬ 
rators  :  the  assignment  operator,  the  transfer  operator,  the  empty 
operator,  the  cycle  operator,  the  procedure  operator  and  the  condi¬ 
tional  operator.  The  first  five  types  of  operators,  in  contrast  with 
the  last  one,  the  conditional  operator,  are  usually  termed  uncondi¬ 
tional  operators. 

The  assignment  operator  assigns  to  particular  variables  definite 
values  specified  by  some  arithmetic  or  Boolean  expression  A.  The  var- 
iables  to  which  the  value  determined  by  the  expression  A  us  assigned 
are  separated  from  one  another  and  from  this  expression  by  a  special 
separator:  =(assignment  symbol).  All  these  variables  constitute  the 
left  part  and  the  expression  A  constitutes  the  right  part  of  the  as- 
signment  operator. 

Examples  of  the  assignment  operators  are:  A:  =*1101:  =t/:  =  n- f 
+  l+p;  m:  —  m  +  1 ;  =  a>b\  rlj,  2*]:  =*5  —  3  x  v  f  2. 

In  the  realization  of  the  assignment  operator  there  must  be  ob¬ 
served  a  strictly  defined  order  of  performance  of  the  operations. 

First  in  order  (from  left  to  right)  there  are  calculated  the  values 
of  the  indices  (speicficed  by  the  arithmetic  expressions,  which  in 
this  case  are  termed  the  subscript  expressions)  of  all  the  variables 
of  the  left  part.  Then  there  is  computed  the  value  of  the  arithmetic 
expression  in  the  right  part  and  the  value  obtained  is  assigned  to  all 
the  variables  of  the  left  part  (with  the  already  compute  subscripts). 
Thus,  for  example,  the  operator:  A:  =  B:  =  p  +  q  must  be  performed 
as  z:  =  p  +  q;  B:  =  zj  A:  =  z,  and  not  as  B:  =  p  +  q;  A:  =  p  +  q.  The 
difference  lies  in  the  fact  that  the  value  of  the  arithmetic  expres¬ 
sion  p  +  q  can  change  with  each  new  calculation  (for  example,  if  it 
contains  some  function  whose  values  are  determined  by  a  procedure 
which  changes  In  the  process  of  its  performance).  Therefore  in  the 
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performance  of  theassignment  the  value  of  the  arithmetic  expression 
must  be  compute  only  one  time,  and  not  with  application  to  each  in¬ 
dividual  assignment. 

The  operators  in  ALGOL  can  be  provided  with  labels,  for  which 
use  is  made  of  any  identifiers  or  unsigned  integers  (in  the  latter  case 
prefixing  of  zero  before  the  number  does  not  alter  the  value  of  the 
label).  However,  in  order  to  facilitate  the  construction  of  transla¬ 
tors  (programs  for  translation  from  the  ALGOL  language  to  machine  lan¬ 
guages)  use  of  numbers  as  labels  is  frequently  avoided.  The  label  is 
separated  from  the  operator  by  a  colon.  The  operators  (labeled  or  un¬ 
labeled)  are  arranged  sequentially  one  after  the  other,  separated  from 

one  another  by  a  semicolon,  for  example:  p:  A;  B;  kl :  C  where  A,  B,  C 

•  •  •  •  •  • 

are  operators,  and  jd  and  kl  are  labels  (the  operator  B  is  an  unlabeled 

operator).  The  same  operator  can  have  not  just  a  single,  but  as  many 

labels  as  desired  (separated  from  one  another  by  colons),  for  example 
p:A:r7:  A  (here  the  operator  A  has  three  labels:  p,  A  and  r7). 

Usually  the  operators  in  ALGOL  are  performed  sequentially,  one 
after  the  other,  in  the  order  of  their  writing.  Variation  in  the  or¬ 
der  of  performance  of  the  operators  is  accomplished  by  an  operator 
termed  the  transfer  operator.  In  the  simplest  form  the  transfer  ope¬ 
rator  consists  of  the  serivce  words  go  to  and  some  label  L.  The  mean¬ 
ing  of  the  action  of  this  operator  consists  in  that  on  coming  to  it 
a  transfer  (jump)  is  made  to  the  operator  having  L  as  its  label. 

In  the  general  case  In  the  transfer  operator  after  the  words  go 
to  there  is  placed  some  des Ignat Iona 1  expression.  The  label  is  only 
one  of  the  simplest  examples  of  the  designational  expressions.  A  more 
complex  example  of  the  designational  expression  is  the  expression 
composed  of  two  labels,  say  L  and  M,  and  some  Boolean  expression 
(j  :  if  g  then  L  else  M  .  The  value  of  this  designational  expression  is 
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equal  to  L  if  the  condition  C  is  satisfied,  and  to  M  otherwise.  In 
place  of  the  label  M  (but  not  in  place  of  the  label  L)  in  this  expres¬ 
sion  there  can  be  substituted  any  complex  designational  expression,  and 
the  similar  substitution  process  can  be  continued. 

As  a  result  there  can  arise  complex  recursive  constructions  for 
the  transfer  operator,  for  example  go  to  if  i  =  1  then  L  else  if  i  =2  then  M  else  P  • 
For  simplification  of  this  construction  use  is  made  of  the  so-called 
switch  transfer.  The  switch  consists  of  some  identifier  and  a  follow¬ 
ing  so-called  index  expression  enclosed  in  index  (i.e.,  square)  brack¬ 
ets.  The  index  expression  is  any  arithmetic  expression  which  in  the 
computation  must  every  time  be  rounded  off  to  the  nearest  integral 
value. 

If,  for  example,  the  switch  identifier  is  s  and  the  index  expres¬ 
sion  is  the  variable  (expression)  _i,  then  the  switch  transfer  operator 
s[i]  is  written  go  to  s[i].  In  itself  such  an  expression  does  not  yet 
have  any  meaning.  In  order  to  give  it  meaning  it  is  necessary,  in  ad¬ 
dition  to  the  expression  s[i],  which  we  shall  terni  the  switch  indica¬ 
tor  and  consider  as  a  simple  designation  expression,  to  also  intro¬ 
duce  the  so-called  switch  description,  usually  placed  together  with 
the  description  of  the  types  of  variables,  arrays  and  procedures 
(functions).  The  switch  description  begins  with  the  service  word 
switch,  after  which  goes  the  switch  identifier,  then  the  assignment 
symbol  :  =  and,  finally,  the  so-called  switch  list,  i.e.,  the  list  of 
designational  expressions  separated  from  one  another  by  commas.  For 
example:  switch  s:  =  L,M,P  (where  L,  M,  P  are  labels). 

On  encountering  the  switch  transfer  operator,  for  example  go  to  s(»], 
we  compute  the  corresponding  index  expression,  substituting  in  it  the 
current  values  of  the  variables  (say,  i  =  2).  After  this  we  turn  to 
the  description  of  the  switch  with  the  same  identifier  _s  and  accom- 


plish  the  transfer  with  respect  to  that  designational  expression  in 
this  description  whose  sequential  number  in  the  list  coincides  with 
the  found  value  of  the  index  expression.  In  the  case  considered  there 
will  be  performed  a  transfer  with  respect  to  the  label  M,  i.e.,  with 
respect  to  the  second  element  of  the  switch  list. 

If  the  value  of  the  index  expression  in  the  switch  indicator  can¬ 
not  be  calculated  (as  a  result  of  the  fact  that  values  have  not  yet 
been  assigned  to  certain  variables)  or  if  this  calculation  leads  to  a 
number  which  is  not  a  number  of  any  element  of  the  switch  list,  then 
the  transfer  operator  is  not  performed  and  there  is  immediately  per¬ 
formed  the  operator  following  it.  In  the  example  considered  above  the 
\alues  of  the  index  expressions  equal  to  4.0  or  —1  do  not  lead  to  the 
objective.  However,  the  values  of  the  index  expressions  equal  to  2.2 
or  2.7  lead  (after  their  roundoff)  to  transfers  with  respect  to  the 
second  or,  correspondingly,  with  respect  to  the  third  element  of  the 
switch  list  (i.e.,  with  respect  to  the  labels  M  or  P). 

A  label,  switch  indicator  or  any  designational  expression  en¬ 
closed  in  round  brackets  is  a  simple  designational  expression.  Prom 

the  two  designational  expressions  A  and  B  (of  which  the  first  is  nec- 

•  • 

essarily  simple)  and  the  Boolean  expression  C  we  can  compose  the  com- 

plex  designational  expression  if  $  then  W  else  IB,  which  coincides  with  the 

expression  A  in  the  case  of  satisfaction  of  the  condition  C  and  with 
•  • 

the  expression  B  otherwise.  Thus,  for  the  designational  expressions 
exactly  the  same  recursive  constructions  are  found  to  be  possible  as 
for  the  arithmetic  (or  Boolean)  expressions. 

The  third  type  of  operator  used  in  ALGOL  is  the  so-called  empty 
operator,  which  does  not  perform  any  operation  and  designated  an  empty 
set  of  symbols.  Usually  the  empty  operator  is  provided  with  a  label 
and  serves  for  the  return  using  this  label  (as  a  result  of  the  appli- 
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cation  of  the  transfer  operator)  to  the  required  segment  of  the  pro¬ 
gram.  The  use  of  the  empty  operator  with  a  label  is  absolutely  neces¬ 
sary,  for  example,  in  the  case  when  it  is  necessary  to  perform  a 
transition  from  the  middle  of  a  program  to  its  end  (not  to  the  follow¬ 
ing  nonempty  program  operator  but  precisely  to  the  end  of  the  pro¬ 
gram).  Just  as  in  the  other  operators,  there  must  be  a  colon  placed 
after  the  label  in  the  empty  operator. 

Of  very  great  value  in  the  construction  of  programs  in  the  ALGOL 
language  are  the  so-called  cycle  operators,  whose  meaning  is  that  some 
operator  (or  group  of  operators)  is  performed  some  number  of  times  in 
sequence.  The  cycle  operator  consists  of  the  cycle  heading  and  the 
operator  itself  (which  can  be  any  operator),  which  is  performed  mul¬ 
tiply  in  the  cycling  process. 

The  cycle  heading  begins  with  the  service  word  for  and  terminates 
with  the  service  word  do.  After  the  word  for  there  stands  the  identi¬ 
fier  of  that  variable  which  changes  in  the  process  of  the  performance 
of  the  cycle.  This  variable  is  termed  the  cycle  paramter.  Following 
it,  after  the  assignment  symbol  :  =,  there  is  the  so-called  cycle  list, 
the  listing  of  those  values  which  the  variable  must  take  during  the 
cycle  operating  time.  The  cycle  list  consists  of  one  or  several  ele¬ 
ments  of  the  cycle  list,  separated  from  one  another  by  commas.  In  the 
simplest  case  the  arithmetic  expressions  (in  particular,  simply  num¬ 
bers)  are  used  as  the  elements  of  the  cycle  list.  For  example,  the  cy¬ 
cle  operator  for  i:  —1.2,3  do  alii:  -  if  2  performs  the  sequential  assignments 
all]: »  1;  al21:=-4;  af3):  =**9  .  The  length  of  the  cycle  in  this  case  is  equal 
to  3. 

If  the  cycle  parameter  must  take  not  three,  but,  say,  a  thousand 
different  values,  then  the  listing  of  all  these  values  in  the  cycle 
list  would  be  excessively  cumbersome.  In  this  case  we  use  as  the  cy- 
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cle  elements  special  constructions,  each  of  which  gives  immediately 

some  set  of  values  of  the  cycle  parameter. 

In  ALGOL  use  is  made  of  two  types  of  such  construction.  The  first 

type  is  constructed  with  the  use  of  the  service  words  step  and  until 

and  has  the  form  A  step  B  until  C,  where  A,  B  and  C  are  arithmetic 

•  •  •  •  •  • 

expressions.  An  element  of  the  cycle  list  of  this  type  gives  the  val¬ 
ues  of  the  cycle  parameter  as  follows :  in  the  first  step  the  cycle 
parameter  is  assigned  the  value  of  the  arithmetic  expression  A,  in 
the  second  step  —  the  value  of  the  arithmetic  expression  A,  -  A  +  B, 

•  Jl  •  • 

in  the  third  -  the  value  Ag  =  A^  +  B  etc.,  until  the  next  value  = 

=  An-1  +  B  exceeds  the  value  of  the  arithmetic  expression  C.*  This 
value  is  not  assigned  to  the  cycle  parameter  and  the  cycle  for  it  is 
not  perfoxmed.  The  cycle  list  is  considered  to  be  exhausted  and,  con¬ 
sequently,  there  must  be  perfoxmed  a  transfer  to  the  operator  directly 
following  the  cycle  operator. 

As  an  example  of  the  construction  described  let  us  consider  the 
cycle  operator  having  the  form  for  i: -12,4  step— 1  until  0,-5  do  aU):~i +10  .  This 

operator  performs  the  sequential  assignment:  a[12]:  =  22;  a[4]  :  =  14; 
a[3]:  =  13;  a[2] :  =  12;  a[l]  ;  =  11;  a[0]:  =  10;  a[-5] :  =  5.  We  note 
that  in  the  first  example  the  cycle  list  element  4  step— 1  until  0  de¬ 
scribes  an  arithmetic  progression  (with  a  difference  equal  to  minus  l), 
however  in  the  general  case  the  step  represented  by  the  arithmetic  ex¬ 
pression  B  (standing  after  the  word  step)  can  be  a  variable,  varying 
with  every  new  repetition  of  the  cycle. 

The  second  type  of  cycle  list  element  is  given  with  the  aid  of 

the  arithmetic  expression  A,  the  Boolean  expression  B  and  the  service 

•  • 

word  while,  written  in  the  sequence:  A  while  B.  This  element  provides 

•  • 

the  sequential  assignment  to  the  cycle  parameter  t  of  the  values  taken 
by  the  arithmetic  expression  A  until  the  condition  B  iS  satisfied 
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(i.e.,  until  the  expression  B  has  the  value  "true" ).  If  with  a  sue- 
ceeding  performance  of  the  cycle  the  condition  B  ceases  to  be  satis- 
fled,  then  the  cycle  operator  is  not  performed  and  there  is  accom¬ 
plished  a  transfer  to  the  operator  directly  following  it.  We  note  that 
in  the  construction  described  it  is  forbidden  to  use  the  word  step, 
so  that  the  expression  of  the  type  step  ©while  C  is  not  encountered  in 
ALGOL. 

With  the  method  described  above  for  the  construction  of  the  cy¬ 
cle  operator,  we  can  accomplish  the  repetition  of  only  one  single 
operator  which  follows  immediately  after  the  word  do.  If  it  is  re¬ 
quired  to  repeat  in  a  particular  cycle  not  one  single  operator,  but 
some  sequence  of  operators  A2;  A^,  then  this  sequence  is  en¬ 

closed  in  special  operator  brackets,  considering  it  after  this  as  a 
single  complex  operator. 

As  the  operator  brackets  we  make  use  of  the  pair  of  service  words 
begin  and  end,  so  that  the  complex  operator  is  written  tMgin  Ax\  At\ 

Ak  end  .  The  complex  operators,  just  as  the  conventional,  can  be 
provided  with  labels  (one  or  several). 

Along  with  the  complex  operators,  in  ALGOL  use  is  made  of  the  so- 
called  blocks,  differing  from  the  complex  operators  in  that  ahead  of 
the  operators  appearing  in  the  block,  directly  after  the  word  begin, 
there  is  placed  a  description  of  the  types  of  certain  quantities 
(identifiers)  which  are  encountered  in  this  block.  In  this  case  the 
quantities  described  in  the  block  are  localized  only  In  the  given 
block  and,  generally  speaking,  they  lost  their  value  (become  indeter¬ 
minate)  with  departure  from  the  block.  If  we  wish  to  retain  the  value 
of  certain  of  the  quantities  described  in  the  block  after  departure 
from  the  block  for  purpose  of  using  them  on  repeated  reference  to  the 
block,  then  to  the  description  of  their  types  there  is  added  the  word 
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own.  An  example  of  the  block :  begin  own  real  •*;  Integer  n\  n:  =  s-M;  r-afn  end 

We  note  that  both  the  complex  operators  and  the  blocks  can  in¬ 
clude  in  themselves  other  blocks  and  complex  operators,  permitting  any 
recursive  depth  of  such  constructions.  The  identifiers  used  within  the 
block  for  the  designation  of  the  improper  quantities  can  be  used  out¬ 
side  the  block  for  the  designation  of  any  other  quantities  which  are 
not  accessible  for  this  block  (i.e.,  which  do  not  figure  in  the  given 
block  and  are  not  subjected  to  any  transformation  in  it).  Identifiers 
which  are  not  described  in  a  block  cannot  be  localized  in  it  and,  con¬ 
sequently,  represent  the  same  objects  both  inside  the  block  and  out¬ 
side  of  it. 

The  labels  are  always  assumed  to  be  localized  within  the  block  in 
which  they  are  encountered,  so  that  entry  into  the  block  can  be  ac¬ 
complished  only  through  its  origin.  No  transfer  oprrator  located  outn 
side  the  block  can  accomplish  transfer  to  any  operator  within  this 
block. 

In  exactly  the  same  way  It  Is  not  possible  to  accomplish  a  trans¬ 
fer  with  respect  to  a  label  located  within  a  cycle  operator  with  the 
aid  of  a  transfer  operator  acting  from  outside  the  cycle.  We  note  also 
that  with  exit  from  the  cycle  operator  as  a  result  of  exhaustion  of 
the  cycle  list  the  value  of  the  cycle  parameter  is  considered  inde¬ 
terminate.  If,  however,  the  exit  from  the  cycle  is  accomplished  as  a 
result  of  the  transfer  operator  contained  in  the  composition  of  the 
operator  (or  block)  which  is  repeated  in  the  given  cycle  (i.e.,  stand¬ 
ing  after  the  word  do)  then  the  value  of  the  cycle  parameter  is  re¬ 
tained  just  as  it  was  immediately  before  the  performance  of  the  trans¬ 
fer  operator. 

All  the  simple  operators  described  above  and  also  the  procedure 
operator  described  below,  and  all  the  complex  operators  and  blocks  to 


-  system  of  so-called  unconditional  operators.  In  ALGOL  there  are 
introduced  two  other  types  of  conditional  operators,  using  the  condi¬ 
tion  if  S  then  (where  B  is  a  Boolean  expression)  which  is  analogous  to 
the  condition  used  in  the  construction  of  the  complex  arithmetic, 
Boolean  and  designational  expressions. 

The  operator  "if*1  is  constructed  from  the  described  condition 
V  and  the  following  unconditional  operator  which  is  performed  in  the 

case  when  the  condition  is  satisfied,  and  is  bypassed  (not  performed) 
otherwise,  Example:  if  fl>vthen  begin  A:  -  n;  goto  Lend  .  The  complex  ope¬ 
rator  begin  A:  =  n  go  to  L  end  is  performed  if  and  only  if  the  condition 
a  >  b  is  satisfied. 

The  conditional  operator  proper  is  obtained  by  the  addition  to 
the  operator  "if"  the  service  word  else  and  the  following  arbitrary 
operator  (possibly  also  conditional).  This  operator  must  be  performed 
in  the  case  when  the  condition  in  the  "if"  operator  is  not  satisfied. 
The  general  structure  of  the  conditional  operator  thus  has  the  form 

if&thendielsedi. 

where  B  is  any  Eoolean  expression,  A^  is  an  unconditional  operator, 

A2  is  any  operator. 

The  eo-called  procedure  operators  are  of  essential  importance  in 
the  construction  of  ALGOL.  Procedure  is  the  term  given  to  some  en¬ 
semble  of  operators  designated  by  some  identifier,  termed  the  proce¬ 
dure  identifier.  In  ALGOL  the  procedures  play  the  same  role  as  the 
subroutines  in  conventional  programming,  permitting  the  acceleration 
of  compilation  of  complex  programs  by  means  of  the  use  of  precompiled 
standard  programs.  Decoding  of  the  procedure  (actual  writing  of  the 
operators  composing  it)  can  be  performed  either  in  the  ALGOL  language 
or  directly  in  the  language  of  the  corresponding  universal  digital 
machine. 
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The  procedure  operator  (if  we  are  not  considering  the  familiar 
standard  procedures)  must  he  described  in  advance.  This  description 
is  accomplished  with  the  aid  of  the  service  word  procedure,  after 
which  there  follows  the  so-called  procedure  heading,  i.e  ,  the  proce¬ 
dure  identifier,  and  after  it  (in  round  brackets)  a  list  of  the  so- 
called  formal  parameters  of  the  procedure,  i.e.,  the  identifiers  sep¬ 
arated  from  one  another  by  special  limiters,  specifically  commas,  or 
by  limiters  of  the  form)  letter  line:  (.  For  example,  procedure  sin 
(x)  or  procedure  A  (x,  y)  pressure:  (p).  The  first  procedure  has  one 
formal  parameter  (x),  and  the  second  has  three  formal  parameters  x,  y, 
jd.  Procedures  without  parameters  are  also  possible.  Their  heading  con¬ 
sist  only  or  procedure  identifiers,  not  accompanied  by  following 
brackets.  The  procedure  itself  (the  so-called  body  of  the  procedure) 
is  written  out  after  the  procedure  heading  in  the  form  of  some  opera¬ 
tor. 

The  procedure  operator  itself,  or,  more  exactly,  the  procedure 
derivation  operator,  is  written  in  the  Bame  form  aB  in  the  procedure 
description,  but  now  without  the  word  procedure  ahead  of  it  and  under 
the  condition  that  the  formal  parameters  of  the  procedure  are  replaced 
by  its  so-called  actual  parameters.  The  brackets  and  limiters  are  the 
same  as  in  the  sescription  of  the  corresponding  procedure.  The  per¬ 
formance  of  the  procedure  operator  consists  in  the  assignment  of  all 
the  formal  parameters  of  the  values  of  the  corresponding  actual  para¬ 
meters,  or  replacement  of  the  formal  parameters  by  actual  and  subse¬ 
quent  performance  of  the  procedure. 

As  the  actual  parameters  use  can  be  made  of  any  expressions 
(arithmetic,  Boolean  or  designational),  array  identifiers  and  switch 
identifiers,  identifiers  of  any  procedures  and,  finally,  the  so-called 
lines. 
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The  lines  are  any  sequences  of  symbols  enclosed  in  special  "line" 
brackets  .  •  .  These  brackets  can  also  be  used  within  a  line.  Example: 

*10+—  IUf  V3'®/ld'  •  The  lines  can  be  used  as  the  actual  para¬ 

meters  only  of  those  procedures  which  are  written  in  machine  codes, 
and  not  in  the  ALGOL  language.  Most  frequently  their  use  is  limited 
to  the  special  procedure  punch  (x)  which  performs  the  printing  or  per¬ 
forating  of  the  actual  parameters,  which  are  represented  in  place  of 
the  formal  parameter  x.  If,  in  particular,  in  place  x  there  is  sub¬ 
stituted  some  line,  then  the  procedure  punch  performs  the  extraction 
of  all  the  symbols  of  this  line  from  the  machine  for  printing  or  per¬ 
forating.  Therefore  the  line  can  contain  not  only  the  "analog"  but  anv 
other  symbols  which  the  considered  printing  or  perforating  device  is 
capable  of  realizing. 

In  the  procedure  description  there  is  also  indicated  the  type  of 
its  formal  parameters.  With  the  substitution  of  the  actual  parameters 
their  types  must  coincide  with  the  types  of  the  corresponding  formal 
parameters.  In  order  to  avoid  ambiguity  in  such  a  substitution,  it  is 
usually  necessary  to  perform  a  replacement  of  those  identifiers  local¬ 
ized  within  the  procedure  which  coincide  with  the  identifiers  occur¬ 
ring  in  the  actual  parameters  being  substituted. 

We  note  that  among  the  procedure  parameters  there  appear,  gener¬ 
ally  speaking,  both  the  input  and  output  (obtained  as  a  result  of  the 
performance  of  the  procedure)  quantitites  of  this  procedure.  If  as  a 
result  of  the  performance  of  the  procedure  there  is  obtained  only  one 
quantity  (number  or  logical  value)  then  it  is  natural  to  denote  this 
quantity  by  the  identifier  of  Its  procedure  (together  with  the  line 
of  actual  parameters).  In  this  case  the  corresponding  procedure  is 
termed  a  function  (see  above)  and  in  its  description  there  Is  placed 
ahead  of  the  word  procedure  the  word  designating  the  type  of  output 
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quantity  of  this  procedure.  Example:  real  procedure  sin  (x).  It  happens 
frequently  that  in  the  procedure  some  formal  parameter  x  participates 
in  several  transformations.  For  example,  the  parameter  x  in  the  proce¬ 
dure  sin(x)  with  the  calculation  of  the  -iine  using  a  series  is  raised 
sequentially  to  the  powers  3,  5,  7  etc.  If  in  the  substitution  this 
parameter  is  replaced  by  a  quite  complex  expression  (actual  parameter), 
say,  x:*=(a  +  b)x(a—  b),  then  in  the  development  of  the  procedure 
we  can  encounter  the  necessity  for  the  repeated  computation  of  this 
expression  in  every  case  when  a  particular  operation  is  performed  with 
the  parameter  x.  Naturally  it  is  simpler  to  compute  the  value  of  x 
ahead  of  time  (prior  to  entry  into  the  procedure)  and  substitute  this 
value  in  place  of  it. 

In  the  automatic  translation  of  a  program  from  the  ALGOL  language 
to  machine  language,  it  is  necessary  every  time  to  communicate  to  the 
translator  (programming  program)  which  parameter  values  must  of  neces¬ 
sity  be  computed  prior  to  substitution  into  the  procedure.  All  such 
parameters  in  the  description  are  labeled  with  the  special  service 
word  value  and  are  placed  after  the  ensemble  of  formal  parameters  of 
the  procedure  heading  before  the  description  of  their  types  (the  so- 
called  specifications).  For  example,  in  place  of  the  description  real 
x;  integer  n  there  may  appear  the  description  vahwn;  re*l*;  Integer n. 

We  shall  usually  supplement  ALGOL  with  two  procedures  which  are 
not  defined  in  the  descriptions  for  the  entry  and  output  of  informa¬ 
tion  from  the  machine.  The  first  procedure  is  always  assigned  the  same 
identifier  read,  and  the  second  is  assigned  the  identifier  punch;  the 
actual  parameters  of  each  of  these  procedures  will  be  considered  ei¬ 
ther  some  quantities  of  the  type  real,  integer  or  Boolean,  or  the  ar¬ 
ray  identifier  of  any  of  these  types. 

We  shall  make  some  other  remarks.  Normally  every  program  is  ALGOL 


is  represented  in  the  form  of  a  block,  i.e.,  is  enclosed  in  the  state¬ 
ment  brackets  begin  -  end.  To  facilitate  the  reading  of  the  "analog" 
programs  there  can  be  introduced  into  them  the  so-called  commentaries, 
i.e.,  clarifications  for  the  programmer,  which  have  no  intrinsic 
meaning  in  the  ALGOL  language  and  which  are  therefore  not  accepted  by 
the  translator  in  automatic  programming. 

The  commentary  is  considered  to  be  every  sequence  of  symbols  (not 
necessarily  "analog")  beginning  with  the  service  word  comment  after 
a  semicolon  or  the  word  begin,  terminating  with  a  semicolon  and  not 
containing  within  itself  other  occurrences  of  a  semicolon.  Any  se¬ 
quence  of  symbols  following  after  the  word  end  to  a  semicolon  or  to 
the  end  of  the  program  is  also  considered  a  commentary  if  it  does  not 
contain  the  words  end  and  else,  or  a  semicolon.  For  example,  in  the 
expressions  comment  text;  begin  comment  text;  end  text;  the  word  "text" 
is  a  commentary.  From  the  point  of  view  of  the  "analog"  programs  the 
first  expression  is  equivalent  to  an  empty  place,  the  second  -  to  the 
word  begin,  and  the  third  —  to  the  word  end. 

For  the  electronic  digital  machines  with  small  and  medium  capac¬ 
ity  the  ALGOL-60  language  is  excessively  complex  to  permit  organizing 
effective  translation  from  it  to  the  machine  language.  Therefore  there 
has  been  proposed  the  simplified  variant  of  ALGOL  which  has  been 
termed  SM0LG0L-61.* 

The  simplification  amounts  to  the  following.  First,  the  alphabet 
is  limited  to  either  only  lower-case  or  only  capital  letters  of  the 
Latin  alphabet.  Second,  we  exclude  from  consideration  the  logical 
operations  of  implication  and  equivalence,  and  also  the  service  words 
while.  Boolean,  true  and  false.  Thus,  the  use  of  the  logical  values 
"true"  and  "false"  is  not  permitted.  The  use  of  the  identifiers  for 
the  designation  of  the  logical  quantities  is  also  prohibited.  The  Boo- 
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lean  variables  can  be  Introduced  by  the  programmer  Indirectly,  with 
the  aid  of  the  replacement  of  the  logical  values  "true"  and  "false” 
by  the  whole  numbers  1  and  0.  The  use  of  Boolean  expressions  is  per¬ 
mitted  only  in  conditions.  If  in  ALGOL  the  value  of  some  Boolean  ex¬ 
pression  B  was  assigned  some  identifier  P  -  P  :  =  B,  then  in  SMOLGOL 
•  • 

there  must  correspond  to  it  the  assignment  of  the  form  p:  »</0  then  1  etoeO, 
where  in  the  right  side  there  now  stands  an  arithmetic  expression  «  - 
rather  than  a  Boolean  expression. 

Further,  the  length  of  the  identifiers  is  llmeted  to  five  letters. 
More  exactly,  identifiers  in  which  the  first  five  letters  coincide  are 
considered  identical  in  SMOLGOL.  In  the  arithmetic  expression  a  f  b 
negative  values  for  the  exponent  b  are  not  permitted  in  the  integer 
type  quantities  a  and  b.  Whole  numbers  are  not  used  as  labels.  The 
step  in  the  cycle  operator  must  either  remain  positive  at  all  times, 
and  in  the  latter  case  the  symbol  "minus"  must  be  placed  explicitly 
ahead  of  the  expression  which  specifies  the  step.  In  cycle  list  there 
must  be  only  one  step-until  element. 

In  all  the  procedures,  except  the  input  and  output  procedures, 
use  cannot  be  made  of  lines  as  actual  parameters.  No  procedure  can  be 
called  on  before  It  has  been  described.  The  possibility  of  using  one 
procedure  within  another  Is  excluded  if  they  were  described  In  the 
same  block.  A  second  callup  of  the  same  procedure  is  forbidden  until 
Its  first  call  has  been  completely  terminated.  For  example,  use  can¬ 
not  be  made  of  recursive  calls  of  the  procedure  F(u,  v)  of  the  form 
F(x,  F(x,  y)).  But  repeated  use  of  a  procedure  after  its  termination 
Is  not  prohibited,  so  that  the  expression  ln(ln  x)  is  completely  ac¬ 
ceptable  in  SMOLGOL.  If  the  procedure  P  is  an  actual  parameter  of  an¬ 
other  procedure,  then  all  the  parameters  of  the  procedure  P  must  be 
described  as  value. 
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Standard  procedures  for  the  finding  of  the  sign  and  absolute 
value  of  a  number  cannot  be  used  as  actual  parameters  in  any  proce¬ 
dures.  If  it  is  necessary  to  use  them  in  this  fashion,  then  they  must 
first  be  described  as  functions,  i.e.,  write  out  the  expression  real  procedure 
abs(x);  value*;  rut?;  feegln  abs:  —  absfa)  end  (and  similarly  for  integer  proce¬ 
dure  sign  (x)).  Neither  procedures  nor  their  formal  parameters  can  be 
of  the  Boolean  type. 

The  variables  themselves  cannot  relate  to  portions  of  the  pro¬ 
gram  outside  of  the  block  in  which  they  are  defined.  The  boundaries  of 
arrays  must  be  constant.  The  elements  of  the  switch  lists  in  the 
switch  descriptions  can  be  only  labels  and  not  any  designational  ex¬ 
pressions. 

The  descriptions  of  the  procedures  which  are  called  in  any  block 
must  be  accomplished  after  the  description  of  the  types,  switches  and 
arrays  of  the  corresponding  block.  The  procedure  identifiers  can  ap¬ 
pear  within  a  procedure  only  in  the  case  when  they  are  the  left  parts 
of  the  corresponding  assignment  operators.  Some  other  limitations  also 
exist. 

We  note  that  any  program  written  in  SMOLGOL  can  also  be  consid¬ 
ered  as  an  "algol"  program.  Generally  speaking,  the  reverse  is  not 
true. 

§5.  EXAMPLES  OF  PROGRAMMING  USING  ALGOL-60. 

Let  us  consider  first  a  very  simple  example  which  has  already 
been  used  in  §3  of  the  present  chapter  as  an  illustration  of  the  prin¬ 
ciples  of  programming  in  machine  languages.  This  is  the  calculation 
of  the  value  of  the  sum  2-n-  .  The  corresponding  "algol"  program  can 
be  written  in  the  form 
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begin  real  s;  integer  m,  *; 
read  (m);  t:  -  0: 
for  k:  —  1  ktep  I  until  m  do 
*:  -  *  +  l/ikt2; 
punch  (s) 

«nd*.  . 

The  procedures  read(m)  and  punch(s)  respectively  provide  for  the 


entry  of  the  required  information  (upper  limit  of  the  summation)  and 
the  output  of  the  result,  l.e.,  the  value  of  the  desired  sum.  It  is 
easy  to  see  that  the  program  written  in  ALQOL  is  far  more  visible  and 
understandable  than  the  machine  program  written  in  §3  which  solves 


the  same  problem. 

Programs  for  the  other  examples  considered  in  §3  can  also  be 
written  quite  lucidly  and  clearly.  The  computation  of  the  scalar  pro¬ 
duct  of  two  (real)  vectors  (al,  a2,...,  an)  and  (bl,  b2,  . ..,  bn)  is 

represented  in  the  form 

begin  Integer  n;  read  (rip, 

begin  real  r,  integer  f;  real  array  o(l :  n],  b  [1  :/tJ; 
read  (a);  redd  (b);  «:  -  0: 
for  f:  —  1  step  1  until  n  do  , 
a  fflxb  f/fc 
punch  («) 
end  end. 

Multiplication  of  the  vector  (bl,  b2,  . . . ,  bn)  by  the  matrix 

||Aik||  can  be  represented  in  ALGOL  by  the  program 

begin  Integer  hi  rtad  {np 

begin  integer  l,  ki  real  array  6(1  :n),  A[\:ru  J:a]; 

read  (6);  rtad  (A); 
for  k:  —  1  step  1  until  n  do 
begin  «{*]:■>  0; 
for  i:  —  1  step  1  until  n  do 
A  Mb 
mi 

punch  (i); 

end  end. 

In  this  program  one  cycle  operator  occurs  in  another.  Internal 


operator  brackets  are  introduced,  since  in  the  first  (outer)  cycle  it 
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is  necessary  to  perform  a  sequence  consisting  of  two  operators. 

The  two  phrases  written  at  the  end  of  the  program  are  a  commen¬ 
tary  which  does  not  actually  enter  into  the  program  and  is  not  ac¬ 
cepted  by  the  translator  (programming  program). 

We  note  that  the  ALGOL  language  is  a  universal  algorithmic  lan¬ 
guage  and  is  therefore  suitable  for  the  writing  of  any  algorithms.  In 
addition,  as  was  noted  above,  all  the  programs  written  in  ALGOL  can  be 
realized  (under  the  condition  of  the  use  of  a  sufficiently  large  mem¬ 
ory  volume)  by  any  universal  electronic  digital  machine. 

We  shall  make  use  of  the  last  circumstance  to  illustrate  that 
the  universal  elecrtonic  digital  machines  can  perform  not  only  the 
conventional  algorithms,  but  also  algorithms  with  random  transfers 
and  any  self-organizing  systems  of  algorithms. 

In  order  to  have  the  possibility  of  constructing  in  ALGOL  any 
desired  random  algorithms  it  Is  sufficient  to  introduce  Into  it  a 
special  procedure  which  we  shall  designate  as  random  (a,  b).  With  each 
referral  to  this  procedure  it  generates  some  random  number  belonging 
to  the  segment  [a,  b].  Here  it  is  assumed  that  the  selection  is  made 
on  the  basis  of  a  uniform  distribution  law  according  to  which  all  the 
numbers  of  the  indicated  segment  are  considered  equally  probable.  The 
random  numbers  themselves  are  assumed  to  be  of  the  integer  or  real 
type  depending  on  what  type  is  assigned  to  the  formal  parameters  (seg¬ 
ment  bounds)  a,  b.  Of  course  both  these  parameters  must  be  of  the  same 
type. 

The  method  of  construction  of  the  procedure  itself  can  vary  over 
quite  wide  limits.  We  can,  for  example,  simply  write  into  the  machine 
memory  a  table  of  randan  numbers  and  construct  the  procedure  for  their 
sequential  selection.  In  many  cases  a  special  random  number  unit  is 
appended  to  the  electronic  digital  machine.  In  this  case  the  procedure 
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consists  in  the  selection  of  the  numbers  generated  by  the  indicated 
unit  and  their  subsequent  transformation  for  the  purpose  of  reduction 
to  the  given  interval  [a,  b]. 

Wide  use  is  also  made  of  the  various  procedures  which  generate 
sequences  of  the  so-called  pseudorandom  numbers.  For  the  format ion  ox 
such  a  sequence  we  can  make  use,  for  example,  of  the  following  tech¬ 
nique:  sane  positive  number  a^  is  selected  and  squared.  In  resulting 
number  a2  *  a^  there  is  selected  some  group  of  digits  (usually  not  the 
highest  or  lowest).  The  number  b2  formed  by  these  digits  is  taken  as 
the  first  pseudorandom  number.  Squaring  the  number  b2,  we  obtain  the 

p 

new  number  a^  ■  b2  which  we  treat  Just  as  we  did  the  number  a2.  Con¬ 
tinuing  this  process  we  obtain  the  required  sequence  of  pseudorandom 
numbers . 

The  sequence  constructed  in  this  fashion.  If  its  length  is  not 
too  great,  can  be  considered  practically  random.  However,  with  a  long 
sequence  there  occur  various  sorts  of  cyclings  (cyclic  repetitions  of 
previously  encountered  pieces  of  the  sequence;  which  is  what  differ¬ 
entiates  the  pseudorandom  sequences  from  the  purely  random.  However, 
for  each  concrete  case  there  can  be  selected  that  procedure  for  the 
generation  of  the  pseudorandom  sequence  which  foxm  the  purely  random 
sequences. 

With  the  aid  of  the  indicated  procedures  the  problem  is  complete¬ 
ly  resolved  of  the  realization  on  the  universal  electronic  digital 
machines  of  any  random  algorithms.  The  problem  of  the  realization  of 
the  self-organizing  systems  of  algorithms  on  the  machines  is  actually 
even  simpler,  since  in  this  case,  as  a  rule,  we  do  not  have  to  resort 
to  any  special  procedures.  Such  a  realization  is  accomplished  by  the 
usual  methods,  with  the  aid  of  programs  written  in  the  ALGOL  language. 
We  shall  present  examples  of  the  sort  of  self-organizing  systems  de- 
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scribed  in  the  preceding  chapter. 

As  the  first  example  let  us  consider  the  self-adaptive  control 
algorithm  based  on  the  method  of  steepest  descent.  The  task  of  this 
algorithm  is  the  generation  of  those  arrays  A[l:  n]  of  numbers  (con¬ 
trol  actions)  such  that  the  criterion  f  will  have  the  smallest  possible 
value.  The  criterion  f  is  a  known  function  of  certain  parameters  (in¬ 
dications)  of  instruments  which  monitor  the  process  whose  values  in 
the  form  of  the  corresponding  array  B[l:  m]  are  periodically  intro¬ 
duced  into  the  algorithm. 

The  parameters  composing  the  array  B  (control  results)  vary  as 
a  result  of  the  variation  of  the  controlling  actions  (array  A)  and  al¬ 
so  as  a  result  of  other  factors  relating  to  the  controlled  process  and 
which  do  not  depend  on  the  control  algorithm.  The  factors  in  question 
here  reduce  to  the  variation  of  certain  uncontrolled  parameters,  where 
the  nature  of  this  variation  is  not  known  ahead  of  time  to  the  control¬ 
ling  algorithm.  It  is  easy  to  see  that  the  described  control  algorithm 
accomplishes  extremal  regulation  (with  respect  to  the  criterion  f ) 
whose  quality  will  be  better  the  smaller  the  ratio  of  the  time  for 
the  algorithm  to  determine  the  optimal  controlling  actions  (array  A) 
to  the  average  time  in  the  course  of  which  there  occurs  a  sensible 
variation  of  the  uncontrollable  parameters.  It  is  not  difficult  to 
verify  that  the  control  algorithm  in  question  can  be  described  in  the 
ALGOL  language  by  the  following  program: 

begin  integer  l,  j,  k,  real  y ,  r, 

L :  read  ( B)\  y:  -  /  ( B );  s:  -  0; 

<-  for  (:  —  1  step  1  until  n  do 
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begin  A\(\.  -^(0  +  & 
punch  (A);  rtad  (B); 

•:-s+Ptnt2i 

Alt): -A It]— d 
end; 

r:  -  dlsqrt  (#); 

(or  /:«■  1  step  I  until  n  do 

<4(/]:  —r  x  P\0  -fAM; 

punch  M);  rend  (B); 

If  oB*  (y — /(B))  >  /  then  go  to  L  else 

lor  fc  —  1  step  1  until  n  do 

*10: rxP\0t 

punch  (i4); 

go  to  L 
end. 

In  the  construction  of  the  program  the  quantity  d  is  the  steepest 
descent  step  and  the  quantity  £  is  the  accuracy  of  achieving  the  min¬ 
imal  value  of  the  criterion  f.  The  function  sqrt(s)  is  equal  to  the 
square  root  of  s  taken  with  a  plus  sign.  The  array  P[l:  n]  gives  the 
relative  magnitudes  of  the  optimal  increments  of  the  values  of  the 
control  actions  A[l:  n]  at  each  step  of  the  steepest  descent  process. 
It  is  assumed  that  the  quantities  d,  £,  the  real  procedures  f  (B)  and 
sqrt(s)  and  the  initial  values  of  the  components  of  the  array  A[l:  n] 
were  introduced  into  the  algorithm  previously  (prior  to  the  instruc¬ 
tion  with  the  label  L). 

Now  let  us  consider  the  algorithm  with  performs  the  operation  of 
a  discrete  a-perceptron  P.  Let  us  assume  that  the  perceptron  P  has  a 
retina  consisting  of  N  receptors  and  is  designed  for  the  recognition 
of  two  patterns.  The  A-elements  are  (l,  1,  l)-neurons,  the  reward  con¬ 
stant  is  equal  to  unity,  and  the  penalty  constant  is  erual  to  zero. 

The  image  projected  onto  the  retina  is  the  Boolean  array  r[l:  N], 
which  is  read  externally  with  the  showing  to  the  perceptron  of  each 
new  image.  Also  sensed  extrenally  is  the  Boolean  quantity  a  which  is 
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the  applied  signal  on  the  correctness  (truth)  or  incorrectness  (falBi- 
ty)  of  the  response  £  given  by  the  perceptron.  The  response  £  is  simply 
the  number  of  the  pattern  to  which  the  perceptron  assigns  each  image 
shown  to  it.  We  use  sf  and  ss  to  designate  the  output  signals  of  the 
summators  of  the  first  and  second  patterns. 

Let  us  assume  further  that  the  number  of  neurons  of  both  the  first 
and  second  image  is  equal  to  n.  We  use  xf[i]  and  yf[i]  to  denote  the 
numbers  of  the  retina  receptors  to  which  there  are  connected  respec¬ 
tively  the  exciting  and  inhibiting  Inputs  of  the  the  ith  neuron  of  the 
first  pattern,  we  use  xs[i]  and  ys[i]  to  denote  the  corresponding  num¬ 
bers  of  the  receptors  for  the  ith  neuron  of  the  second  pattern,  and 
vf[i]  and  vs[i]  to  denote  the  weights  of  the  ith  neurons  of  the  first 
and  second  patterns  (i  =  1,  2,  ...,  n).  With  these  assumptions,  the 
algorithm  which  performs  the  work  of  the  perceptron  P  (in  the  learning 

regime)  can  be  written  in  the  form 

begin  integer  p,  l,  f,  k\  real  sf,  ss;  Boolean  a; 

Boolean  array  r[l:N)4, 

L:  read  (r);  sf:  «■  0;  ss:  ■»  0; 

for  l:  =■  1  step  1  until  n  do 

begin  \1r[xf[i]\  A~r\yf[i)\  then 

sf-  “  sf  +  vf  (/J; 

if  rfxsfOl  A  ""if  (ysUJi  then 

ss:  ™  ss  4-  usf/| 

end; 

U  s/>ss  then  p:  —  1;  ' 

if  sf  <  ss  then  p:  ■■  2; 
it  sf —  ss  then  go  to  L; 
punch  (p);  read  (a); 

if  o  A  (p  —  1)  then  for  /:  -  1  step  1  until  n  do 
H  r  l *f  (/I)  A  ”Ir  l yf  1/11  then  vf  [j]:  -  vf  (/)  +  I; 
if  aAlP^S)  then  for  *:  -» 1  step  1  until  n  do 
if  r(ss(JkHA  ~ ''li/sl*])  then  ws(*l:  -  i»l*J  +  1; 
go  to  L 

end, 

It  is  assumed  that  the  arrays  x/llinl,  yfll.n],  ssll:nl,  ysll:nl.  t»/(l:nJ«dW{i;nj 
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are  introduced  into  the  program  ahead  of  time  and  that  the  last  two 
arrays  (weights  of  the  neurons  of  the  first  and  second  patterns)  do 
nto  consists  of  only  zeros.  Otherwise  the  pattern  summators  would  gen¬ 
erate  continuously  signals  (sf  and  ss)  which  are  equal  to  zero  and, 
since  it  is  assumed  in  the  program  that  with  equal  signals  of  the  sup¬ 
inators  the  perceptron  will  not  generate  any  signal  £,  the  learning 
process  (and,  in  general,  any  variation  of  the  weights)  would  not 
exist. 

This  last  limitation  can  be  avoided  if  the  teacher  does  not  sim¬ 
ply  supply  a  reward  signal  but  communicates  to  the  perceptron  the  true 
number  of  the  pattern  to  which  the  Image  being  shown  to  the  perceptron 
belongs.  This  is  precisely  the  method  of  functioning  of  the  perceptron 
in  the  learning  regime  which  waB  considered  in  the  preceding  chapter. 
Let  us  indicate  the  changes  in  the  program  described  above  which  must 
be  made  with  application  to  the  new  type  of  signal  a. 

In  the  description  the  quantity  a  must  be  declared  as  a  quantity 
of  the  integer  type  and  not  Boolean.  The  program  changes  can  be  re¬ 
duced  to  the  following.  After  the  operator  if  sf  <  ss  then  p:  -  2  in  place  of 
the  operator  ifjtf -ss  then  go  to  L  it  is  necessary  to  use  the  operator 
If  if  +  ss  then  punchip)  .  Then  in  the  conditional  operators  following  after 
the  operator  read(a),  the  conditions  h  a  A  tp-1),  if  a  A  (p -2)  must  be  re¬ 
placed  by  the  conditions  if  a  ■  1  and  if  a  «  2  respectively. 

It  is  also  not  difficult  to  describe  the  changes  in  the  original 
perceptron  program  which  must  be  made  in  order  to  simulate  the  per¬ 
ceptron  self-learning  regime  rather  than  the  learning  regime.  To  do 
this  the  quantity  a  is  completely  excluded  from  the  program  together 
with  the  corresponding  operator  read  (a).  In  the  following  conditional 
operators  there  must  be  added  to  the  conditions  standing  after  the 
service  words  do  the  terms  A  (*/>**)  and  A  (s/ <  «)  respectively. 

4 
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Thus,  in  both  the  learning  and  self-learning  regimes  the  a-per- 
ceptrons  can  be  easily  simulated  in  the  ALGOL  language  and  consequent¬ 
ly  can  be  simulated  on  the  universal  electronic  digital  machines.  It 
is  easy  to  see  that  the  same  holds  for  any  modifications  and  generali¬ 
zations  of  the  perceptron  circuit. 

Let  us  now  describe  using  the  ALGOL  language  still  another  self¬ 
organizing  algorithmic  system  which  simulates  the  process  of  biologi¬ 
cal  evolution  and  the  formation  of  new  species.  The  modeling  of  such 
a  system  (although  somewhat  different)  on  the  universal  electronic  dig¬ 
ital  machine  has  been  accomplished  by  Letichevskiy  [49]. 

Let  us  consider  a  discrete  space  consisting  of  a  finite  set  of 
pointB  with  the  numbers  from  1  to  n  inclusive.  Let  us  assume  for  sim¬ 
plicity  that  this  set  is  cyclically  ordered.  In  other  words,  for  each 
point  1  we  define  the  two  points  neighboring  with  It  —  the  point  di¬ 
rectly  preceding  it  bi  and  the  point  directly  following  It  fi.  If  i  / 

1,  then  bi  =  i  -  1;  for  i  =  1  we  set  bi  =  n.  Similarly,  if  i  /  n,  then 

fi  «  i  +  1,  and  for  i  =  n  we  set  fi  =  1. 

To  every  point  1  of  the  space  we  assign  some  state  s[i]  which  can 

take  any  integral  value  from  0  to  k  Inclusive.  If  s[i]  =0  the  corre¬ 
sponding  point  is  considered  "lifeless."  If,  however,  s[i]  ^  0,  then 
we  assume  that  at  the  point  J.  there  is  some  "living  being"  in  the 
state  s[ij.  As  the  such  "living  beings"  in  the  considered  model  we  se¬ 
lect  abstract  automata  with  the  same  number  of  states  (equal  to  k) 
but,  generally  speaking,  with  different  transfer  and  output  tables. 

In  addition,  for  each  point  i  of  our  space  there  is  given  the 
number  F[i],  equal  to  1  or  0  in  accordance  with  whether  or  not  ther  Is 
"food"  at  the  ith  point.  In  the  case  of  the  existence  of  "food"  at  a 
particular  point  its  supply  is  assumed  so  large  (or  self-replenishing) 
that  the  automaton  located  at  this  same  point  will  practically  not  al- 
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ter  the  supply  in  the  course  of  its  "feeding."  The  array  P  is  altered 
at  each  successive  cycle  of  operation  of  the  algorithm  in  accordance 
with  some  "law  of  nature"  which  we  shall  assume  to  be  located  outside 
of  our  algorithm. 

In  addition  to  the  state  s[i]  itself  of  the  automaton  occupying 
the  point  i,  with  this  automaton  we  associate  also  two  other  numbers, 
namely  the  indications  of  its  "life"  counter  L[i]  and  the  so-called 
"hunger"  counter  H[i].  The  quantity  L[i]  increases  by  unity  at  each 
cycle  of  operation  of  the  algorithm,  and  after  its  value  exceeds  some 
prespecified  threshold  i,  the  corresponding  automaton  transfers  into 
the  zero  state,  i.e.,  simply  speaking,  it  is  destroyed  (simulating 
thereby  natural  death). 

The  quantity  H[i]  increases  by  unity  if  F[i]  =  0  (i.e.,  in  the 
case  when  the  automaton  is  located  at  a  point  of  the  space  without 
"food")  and  decreases  by  unity  in  the  opposite  case,  without,  however, 
taking  negative  values  (in  the  case  when  H[i]  =  Owe  set  H[i]  —  1  =  0. 
When  the  quantity  H[l]  exceeds  sane  level  h  which  is  fixed  in  advance, 
the  corresponding  automaton  transitions  into  the  zero  state  (thereby 
simulating  death  from  hunger). 

The  input  signals  of  the  automaton  located  at  the  point  1  are  the 
states  s[bi]  and  s[fi]  of  the  neighboring  points,  and  also  the  signals 
F[bi],  F[i] ,  F[fi]  on  the  presence  or  absence  of  food,  both  at  the 
point  _i  itself  and  at  the  neighboring  points.  The  output  signal  m  is 
the  so-called  motion  of  the  automaton,  i.e.,  in  other  words,  the  in¬ 
crement  of  the  number  of  the  spatial  point  occupied  by  the  automaton 
in  the  given  automaton  operating  cycle.  We  shall  consider  that  the 
quantity  m  can  take  only  three  different  values:  0,  1  and  —1. 

In  view  of  the  presence  of  five  input  channels,  the  switching  and 
output  tables  of  each  automaton  can  be  specified  in  the  form  of  six- 
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dimensional  arrays.  For  the  specification  of  th  switching  and  output 
tables  of  all  the  automata  at  the  same  time  we  make  use  of  the  seven¬ 
dimensional  arrays  SP[1:  n,  l:k;  0:k,  0:k,  0:1,  0:1,  0:1]  and  M[l:n, 
l:k,  0:k,  0:k,  0:1,  0:1,  0:1]  respectively.  The  first  index  in  each  of 
these  arrays  indicates  the  number  of  the  cell  _i  occupied  by  the  autom¬ 
aton,  the  second  indicates  the  state  s[i]  of  this  automaton,  the  third 
and  fourth  indicate  the  states  of  the  neighboring  points  s[bi]  and 
s[fi],  the  fifth,  sixth  and  seventh  indices  are  the  signals  F[i], 

F[bi]  and  F[fi]  on  the  presence  or  absence  of  "food"  at  the  point  it¬ 
self  and  at  the  neighboring  points.  For  the  motion,  given  by  the  out¬ 
put  table  M,  we  shall  not  introduce  any  limitations  in  the  table  it¬ 
self,  however  the  performance  of  the  corresponding  motion  will  be  ac¬ 
complished  only  in  the  case  when  the  point  to  which  the  automaton  Is 
shifted  is  not  occupied  by  any  other  automaton. 

The  variation  of  the  indications  of  the  "life"  and  "hunger" 
counters  is  accomplished  after  the  performance  of  the  motion  and  the 
transfer  of  the  automaton  into  the  new  state.  If  in  this  case  there 
does  not  occur  "death"  of  the  automaton,  and  its  motion  is  nontrivial 
(i.e.,  the  automaton  does  not  remain  at  the  previous  location),  then 
with  fulfillment  of  certain  additional  conditions  there  takes  place 
"reproduction"  of  the  automaton  by  means  of  fission.  In  this  case  the 
shifted  automaton  A  completely  retains  its  structure  with  the  excep¬ 
tion  of  the  fact  that  on  its  "life"  counter  there  is  established  a 
value  equal  to  zero.  And  at  the  place  occupied  by  the  automaton  A  prior 
to  this  there  appears  its  "double,"  differing  from  A  only  in  that  in 
each  of  the  two  arrays  which  specify  the  transitions  and  outputs  of 
the  automaton  A,  one  number  (respectively  the  new  state  or  the  motion 
of  the  automaton)  is  replaced  by  a  random  number.  The  "life"  counter 
of  the  new  automaton  is  also  set  to  zero. 
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Additional  conditions  for  the  possibility  of  reproduction,  which  we 
discussed  above,  are  that  the  indications  of  the  "life"  counter  be 
included  between  two  a  priori  fixed  numbers  Zi  and  iu,  while  the  in¬ 
dication  of  the  "hunger"  counter  does  not  exceed  some  number  hu,  also 
fixed  ahead  of  time. 

For  the  formation  of  the  random  numbers  we  fix  the  special  inte¬ 
gral  procedure  random  (a,  b),  which  delivers  at  each  call  some  integral 
number  located  on  the  closed  segment  [a,  b].  In  this  case  all  the 
whole  numbers  of  the  indicated  segment  are  considered  equally  probable. 
The  methods  of  construction  of  such  procedures  were  described  above. 

The  algorithm  which  we  have  described  In  one  operating  cycle  must 
perform  a  scan  of  all  the  points  of  our  space,  performing  at  these 
points  the  changes  listed  above.  After  finishing  each  cycle  the  algo¬ 
rithm  must  read  through  all  the  new  values  of  all  the  components  of 
the  array  F[l:n]  and  begin  the  performance  of  the  following  cycle.  We 
shall  accomplish  the  count  of  the  number  of  cycles  with  the  aid  of 
the  special  quantity  t.  When  this  quantity  reaches  the  value  jc  which 
is  fixed  in  advance  the  algorithm  must  terminate  Its  operation. 

To  facilitate  the  programming  of  the  described  algorithm  in  the 
ALGOL  language,  we  Introduce  three  blocks  which  describe  the  process 
of  the  movement  of  the  automaton  located  at  the  ith  point,  the  process 
of  its  "death"  and  the  process  of  its  "reproduction."  For  brevity  let 
us  denote  these  blocks  by  B^,  Bg  and  respectively,  and  we  write 
for  each  of  them  the  corresponding  program  in  ALGOL. 

The  block  B-,  : 

begin  integer  m,  /,  bs,  fs,  f,  bf,  ff\ 

m:  =*  /Vf  [/,  s[Hi  s[bi),  F[i],  F[bt\F  [fift 
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pi:  =  If  /  -f  m  >  n  then  1  else  1 4-  m; 

if  s  [pi]  +  0  then  pi:  =  i  comment  pi  is  the  number  of 

the  point  to  which  the  considered  automaton  is  displaced; 

L[pl]:~L[t]+  1; 

H[pi):  =  if  F[pi]  -  0  then  H[i)  +  1  else  il  //tf]  ¥■  0 
then  H  [/)  —  1  else  0; 
if  pi  ^  i  then 

for  / :  =  1  step  1  until  k  do 

for  bs :  =  0  step  1  until  k  do . 

for  fs:  «  0  step  1  until  k  do 

for  /:  =»  0,1  do  for  bf  «  0,1  do  for  //:  —  0.1  do 

begin  SP  [pi,  j,  bs,  fs,  f,  bf,  ff]:  =»  SP[i,  /,  bs,  fs,  f,  bf,  //l; 

M [pi,  f,  bs,  fs,  f,  bf,  ff):  =*  M[l,  j,  bs,  fs,  f,  bf,  ff\ 
end; 

s[pi):**SP[i,s[i},  s[bi\,  s[fi 1,  FU1,  F[bi),  F\fi\) 

end. 

The  block  accomplishes  the  displacement  of  the  automaton  from 
the  point  _i  to  the  point  pi,  its  translation  into  the  new  state  (de¬ 
fined  by  the  situation  at  the  moment  the  automaton  is  located  at  the 
point  i),  the  change  of  the  indications  of  the  "life"  and  "hunger" 
counters  and  the  rewriting  of  the  arrays  which  specify  the  switching 
and  output  functions  of  the  automata,  with  the  objective  of  bringing 
them  into  correspondence  with  the  new  location  of  the  considered  au¬ 
tomaton.  The  values  of  i  and  pi  are  retained  with  departure  from  the 
block. 

The  block  B2  is  very  simple:  begin  s[p/l:  =0end.  • 

We  note  that  the  program  will  be  constructed  so  that  the  values 
of  L[pi]  and  H[pi]  at  the  point  pi  which  are  retained  after  death  of 
the  automaton,  and  the  values  of  the  corresponding  components  of  the 
arrays  SP  and  M  cannot  lead  to  errors  in  the  furture.  This  is  achieved 
as  the  result  of  the  fact  that  with  repetition  of  the  program  the 
listed  quantities,  before  being  used,  are  defined  anew,  since  they 
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of  necessity  will  occur  first  in  the  left  parts  of  the  corresponding 
assignment  operators. 

The  "reproduction"  block 

begin  integer  r/,  rbs,  rfs,  rf,  rbf,  rff,  J,  bs,  fs,  f,  bf,  ff; 

Boolean  B\ 
rf:  <*>  random  (1,  *); 
rbs :  »  random  (0.  A); 
rfs:  —  random  (0.  k); 
rf  :  -  random  (0.  1); 
rbf  :  —  random  (0, 1); 
rff:— random  (0,1); 
for  /:  - 1  step  1  until  *  do 
for  bs:  —  0  step  1  until  k  do 
for  fs:  —  0  step  1  until  k  do 
for  /:  —  0,1  do  for  bf:  —  0,1  do  for  ff:  »  0.1  do 
begin  B:  -  /-  r/AAs  -rbs/\fs-rfs/\f-rf  ^bf  -  rbf  ^ff -rff ; 
SP\i,l,bs,fs,f,bf,ff]:  —  If  B  then  random  (1.  A) 
else  SP[pi,f,  bs,  fs,  f,  bf,  ff\, 

M  [/,  /,  As,  /s.  /,  A/,  ff):  —  If  B’  then  random  (—1,1) 

else  M[pi,  I,  bs,fs,f,bf,ff\ 

end 

end. 

The  entire  program  for  the  modeling  of  the  evolution  process  is 
now  represented  as  follows : 

begin  Integer  t,  t,  At.  ft.  p,  q.  pi; 

integer  array  F(l:n|.  S(l:n],  £[l:n].  // f  1 :  nf. 

SP[\:n,\:k,0:k,0:k,  0:1,  0:1,  0:1],  Mil:*.  1:*,  0 :k, 

0 0:1,  0:1,  0:1); 
for  q  —  1  step  1  until  n  do 
L[q]:-H[q):- 0; 

t:  =■  0;  l:  —  1;  read  (S);  read  ( SP );  read  (Af); 

Q:  read  (f);  fl:  —  if  i  +  n  then  i  +  1  else  1; 
if  S  [fl  —  0  then  begin  i:  —  ft;  go  to  P  end; 

At:  —  If  i  +  1  then  i  —  1  else  m 

|  block  ®|  (I 

If  L[pt)>l\J  H\pi)>h  then 

I  block  ®,|; 
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If  pi  *  i/\H\pi\  <  huAL[pi]<lu/\L[pi]  >  U  then 

|  block  |i 

if  pi  *  fi  then  i:  =  fi  else  i:  -  if  fi  *  n  then  fi  +  1  else  1; 

P:lf  i  -  1  \Jpi  =  1  then  t:  =  t  +  1;  If  *  *  P  thengo  to  Q 
end. 

We  note  that  the  program  which  we  have  constructed  is  not  eco¬ 
nomical  from  the  point  of  view  of  the  use  of  the  memory  and  the  ne¬ 
cessity  for  rewriting  of  the  multi-dimensional  arrays.  We  can  achieve 
a  far  more  economical  program  construction  if  we  introduce  numeration 
of  the  auromata  and  use  in  the  arrays  L,  H,  SP  and  M  the  number  of 
the  corresponding  automaton  in  place  of  the  number  of  the  spatial 
point. 

In  the  real  modeling  of  the  evolutionary  process  on  a  universal 
electronic  digital  machine  in  [43] ,  use  was  made  of  a  program  with 
more  limited  capabilities,  nevertheless,  the  experiments  conducted 
showed  that  even  with  these  conditions  the  quality  of  the  simulation 
was  quite  satisfactory.  For  relatively  simple  "laws  of  nature"  the 
process  of  adaptation  of  the  autom^a  to  the  surrounding  medium  and 
the  formation  of  stable  "species"  were  observed  after  several  tens  of 
thousand  of  cycles  of  operation  of  the  algorithm  and  the  replacement 
of  the  corresponding  number  of  "generations."  Initially  the  transition 
and  output  tables  of  the  automata  (arrays  M  and  SP  in  our  case)  were 
specified  arbitrarily.  In  the  evolution  process  there  took  place  a 
"dying  out"  of  the  poorly  arranged  automata  and  the  appearance  of 
forms  which  were  better  adapted  for  "life"  under  the  given  conditions. 
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[Footnotes  ] 


The  diode  matrices  are  two  systems  of  conductors,  usually 
termed  buses,  a  part  of  which  is  interconnected  by  diodes, 
i.e.,  elements  which  pass  current  in  only  one  direction. 
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An  attempt  is  usually  made  to  avoid  such  roundoffs  in  ALGOL, 
since  natural  roundoff  of  the  quantity  23.5  in  some  machines 
leads  to  23,  and  in  others  to  24. 


If  the  step  B  is  negative,  then  the  value  of  the  expression 
C  —  A^is  takfen  with  reversed  sign. 

For  a  description  of  the  SM0LG0L-61  language  see:  Communicp- 
tions  of  the  Assoc,  for  Comp.  Mach.,  1961,  Vol.  4,  No.  11, 
pages  499-502. 
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Chapter  6 

PREDICATE  CALCULUS  AND  THE  PROBLEM  OF  AUTOMATION  OF 
THE  SCIENTIFIC  CREATIVE  PROCESSES 

§1.  BASIC  CONCEPTS  OF  PREDICATE  CALCULUS 

As  we  mentioned  in  Chapter  2,  the  simples  component  part  of 
mathematical  logic  -  propositional  calculus  -  does  not  really  pene¬ 
trate  into  the  structure  of  the  elementary  propositions,  thereby 
limiting  its  capabilities  in  the  formalization  of  the  more  complex 
thought  processes.  The  next  higher  stage  of  mathematical  logic  with 
regard  to  complexity,  termed  restricted  predicate  calculus  or  first 
degree  predicate  calculus,  posseses  far  stronger  expressive  capabil¬ 
ities. 

One  characteristic  feature  of  predicate  calculus  is,  first  of 
all,  that  along  with  the  variable  propositions  which  can  take  only 
two  possible  values  ("true"  and  "false")  there  are  introduced  into 
consideration  the  so-called  object  variables  which  run  through  some, 
generally  speaking,  infinite  region  of  values,  which  is  customarily 
termed  the  object  region.  The  values  composing  this  region  are  usually 
termed  objects. 

Fixing  a  particular  object  region,  we  obtain  the  possibility  of 
constructing  the  propositional  functions  of  the  object  variables,  usu¬ 
ally  termed  predicates :  the  n-place  predicate  P(x^,  x2,  ...,  xfi)  Is  a 
variable  proposition  whose  truth  or  falsity  is  determined  by  sets  of 
values  of  the  object  variables  x^,  x2,  ...,  xR.  If  the  predicate  P  is 
not  identically  true  or  identically  false,  then  on  seme  sets  of  val- 
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ues  of  the  object  variables  it  takes  the  value  "true"  and  on  others  - 
the  value  "false." 

In  the  classical  theory  of  predicates  only  the  single-place  pred¬ 
icates  were  called  predicates  (or  properties).  For  the  multiplace 
predicates  the  special  tern  "relation"  was  used:  the  two-place  pred¬ 
icates  were  termed  binary  relations  the  three-place  were  termed  ter¬ 
nary  relations,  etc.  For  our  purposes  there  is  no  need  for  special 
emphasis  of  this  difference,  therefore  we  shall  term  predicates  any 
functions  of  any  (greater  than  zero)  number  of  object  variables. 

The  use  of  predicates  permits  the  construction  of  a  formal  lan¬ 
guage  analogous  to  propositional  calculus  but,  in  contrast  with  it, 
penetrating  into  the  structure  of  the  elementary  propositions.  For  ex¬ 
ample,  the  proposition  "four  is  larger  than  two"  is  indecomposable  in 
propositional  calculus.  However,  if  we  introduce  the  predicate  P(x,  y) 
with  the  set  of  whole  nonnegative  numbers  as  the  object  region,  true 
if  and  only  if  the  inequality  x  >  y  is  satisfied,  then  this  proposi¬ 
tion  is  written  in  the  form  P(4,  2),  which  now  gives  an  idea  of  the 
internal  structure  of  the  proposition. 

The  internal  structure  of  the  proposition  "oxygen  is  a  gas"  can 
be  revealed  in  exactly  the  same  way.  To  do  this  it  is  sufficient  to 
introduce  the  predicate  "is  a  gas"  which  takes  the  value  "true  if  and 
•  only  if  there  is  substituted  in  it  an  object  of  the  object  region 
which  actually  is  a  gas.  If  we  designate  this  predicate  by  Q(x),  then 
the  phrase  which  we  presented  can  be  written  in  the  form  Q(oxygen). 

In  the  formal  construction  of  predicate  calculus  we  are  not  usu¬ 
ally  interested  in  the  exact  objects  from  which  a  particular  object 
region  is  constituted,  it  is  sufficient  to  know  only  the  number  of  all 
these  objects  or,  expressing  it  more  precisely,  the  power  of  the  set 
of  all  the  objects  composing  the  object  region.  If  the  object  region 
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is  finite  or  countable  the  objects  composing  it  can  be  replaced  by 
their  numbers.  Thereby  the  object  regions  are  reduced  to  number  sets, 
which  facilitates  the  problem  of  concrete  expression  of  the  correspond¬ 
ing  predicates.  Since  the  predicates  are  variable  propositions,  all 
the  operations  used  in  the  second  chapter  in  the  construction  of  the 
propositional  calculus  can  be  used  with  them.  At  the  same  time  the  use 
of  the  object  variables  permits  the  introduction  of  several  new  ope¬ 
rations  which  are  specific  for  the  predicate  calculus.  The  construc¬ 
tion  of  these  operations  is  accomplished  with  the  aid  of  the  so-called 
quantifiers.  Usually  we  limit  ourselves  to  only  two  forms  of  quantifi¬ 
ers,  termed  exlstenslonal  quantifiers  and  generality  quantifiers.  For 
their  designation  we  shall  use  the  symbols  3x  and  Vx  respectively, 
where  x  indicates  the  variable  on  which  the  quantifier  acts. 

The  expression  3xP( x)  is  the  conventional  designation  for  the 
proposition  "three  exists  that  object  x  for  which  the  predicate  P  is 
true."  Similarly  the  expression  VxP(x)  designates  the  proposition  "for 
all  objects  x  the  predicate  P  is  true."  Here  it  is  understood  that 
the  objects  under  discussion  belong  to  the  particular  fixed  object 
region  M.  If  the  region  M  consists  of  the  finite  number  of  objects 
xl'  x2'  •••»  xk>  the  exPression  3xP{x)  reduces  to  the  disjunction 
P(*i)  V  />(**)  V  —  V  F(jr*).  and  the  expression  yxp{x\  reduces  to  the  conjunc¬ 
tion  P(xx)  a  P(xt>  A  ...  A  P(xt)  .  In  the  case  of  an  infinite  object  region 

* 

this  reduction  is  not  possible,  since  the  constructive  nature  of  our 
constructions  excludes  the  possibility  of  the  use  of  infinite  disjunc¬ 
tions  and  conjunctions. 

In  the  nonconstructive  (the  so-called  set-theoretic)  approach  to 
the  construction  of  predicate  calculus,  we  can  always  picture  the  ex¬ 
pressions  3xP(x)  and  VxP(x)  as  disjunction  and  conjunction  extended  to 
all  the  objects  x  composing  the  given  object  region  M.  In  the  con- 
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struotive  approach  this  representation  can  be  used  only  for  the 
heuristic  (inductive)  reasonings  and  constructions,  but  not  at  all  as 
a  method  of  strict  formal  proof. 

In  the  expressions  3xP(x)  and  »•  VxP{x )  the  variable  x  is  bound  by  the 
corresponding  quantifier.  In  contrast  with  the  free  (unbound)  vari¬ 
ables,  for  example  the  variables  x  and  £  in  the  expression  Q(x,  y), 
the  bound  variables  do  not  have  independent  (individual)  value,  Bince 
the  proposition  containing  the  bound  variables  actually  does  not  de¬ 
pend  on  these  variables.  The  role  of  the  bound  variables  in  the  pred¬ 
icate  calculus  in  this  sense  is  completely  analogous  to  the  role 

n 

played  by  the  variable  index  _i  in  the  calculation  of  the  sum  £  ^  or 

M 

the  integration  variable  x  in  the  calculation  of  the  definite  inte- 

» 

gral  tj(x)dx  .  We  can,  in  particular,  replace  the  bound  variable  with 
any  other  variable  without  altering  the  sense  or  value  of  the  corre¬ 
sponding  expression  in  so  doing. 

Any  formula  of  restricted  predicate  calculus  is  constructed  with 
the  aid  of  the  four  operations  of  propositional  calculus  (negation, 
disjunction,  conjunction  and  implication)  and  two  coupling  operations 
with  the  aid  of  the  object  quantifiers  (generality  and  existensional) 
from  elementary  propositions,  which  are  usually  the  familiar  variable 
propositions  (propositional  letters)  and  the  propositional  functions 
(predicates)  defined  above.  Here  the  object  region  is  assumed  fixed, 
and  the  total  number  of  symbols  composing  any  formula  must  of  necessity 
be  finite.  For  unity  of  terminology  the  elementary  variable  proposi¬ 
tions  which  do  not  depend  on  the  object  variables  (i.e.,  the  proposi¬ 
tional  letters)  are  conveniently  considered  as  zero-place  predicates 
(propositional  functions  of  an  empty  set  of  object  variables). 

Just  as  in  propositional  calculus,  in  the  predicate  calculus  we 
can  make  use  of  round  brackets  for  the  designation  of  the  order  of 
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operations  in  the  formulas.  These  brackets  are  also  used  to  establish 
the  action  region  of  the  quantifiers  which  appear  in  the  formula.  For 
example,  in  the  expression  3x(P(x,y)  d  Q(*))  ?  /?(*)  the  action  region  of 
the  existensional  quantifier  .gx  includes  only  the  expression  P{x,y)  d  Q(x). 
The  variable  x  appearing  in  this  expression  is  bound  by  the  indicated 
quantifier,  while  the  variable  x  in  the  predicate  R(x)  must  be  consid¬ 
ered  as  a  free  variable. 

In  order  to  avoid  confusion  during  the  various  sorts  of  trans¬ 
formations  of  formulas  in  predicate  calculus,  we  usually  prefer  to  re¬ 
designate  the  bound  variables  so  that  their  notations  differ  both 
from  one  another  and  from  the  notations  of  all  the  free  variables  ap¬ 
pearing  in  the  same  formula.  In  this  case  we  can  consider  that  the  ac¬ 
tion  region  of  each  quantifier  extends  from  the  place  of  its  occur¬ 
rence  right  to  the  very  end  of  the  formula.  Thereby  the  use  of  brack¬ 
ets  for  the  designation  of  action  regions  can  be  made  superfluous. 
Hereafter  we  shall  adhere,  as  a  rule,  to  precisely  this  Interpreta¬ 
tion  of  the  action  regions  of  the  quantifiers. 

We  note  that  In  restricted  predicate  calculus  only  the  object 
v«ribles  are  permitted  to  be  bound  using  the  quantifiers.  Here  the 
predicates  appearing  in  a  formula  are  assumed  to  be  unchanging.  Such 
a  limitation  naturally  restricts  the  region  of  applications  of  the 
logical  calculus  which  we  are  constructing,  which  explains  the  inclu¬ 
sion  in  its  name  of  the  term  "restricted."  In  the  so-called  extended 
predicate  calculus  use  is  made  of  variable  predicates  and  predicate 
quantifiers.  In  other  words,  there  are  permitted  expressions  of  the 
form  MP(x)  is  valid  for  every  predicate  P"  or  "there  exists  the  pred¬ 
icate  Q  for  which  the  proposition  Q(x)  is  true"  etc.  With  unlimited 
use  of  predicate  quantifiers  there  arises  the  possibility  of  con¬ 
struction  of  internally  contradictory  formulas  and  the  appearance  of 
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paradoxes.  All  this  leads  to  the  necessity  for  further  complication 
of  the  corresponding  calculi.  However,  we  shall  not  concern  ourselves 
with  a  detailed  study  of  the  extended  predicate  calculus,  but  shall 
concentrate  our  attention  on  the  restricted  predicate  calculus.  There¬ 
fore  hereafter  when  we  use  the  term  predicate  calculus  (unless  other¬ 
wise  stipulated  we  shall  always  mean  the  restricted  predicate  calculus. 
We  also  shall  not  consider  other  possible  generalizations  of  the  pred¬ 
icate  calculus,  for  example  predicate  calculus  with  several  object  re¬ 
gions  rather  than  only  one,  etc. 

Just  as  in  the  case  of  propositional  calculus,  in  the  construc¬ 
tion  of  predicate  calculus  it  is  not  sufficient  to  indicate  only  the 
method  of  writing  the  formulas.  It  is  necessary  also  to  give  the  rules 
for  the  transformation  of  the  formulas,  expressed  by  axioms.  The  axi¬ 
oms  of  predicate  calculus  'include  all  11  axioms  of  propositional  cal¬ 
culus  which  were  given  in  §5  of  Chapter  2.  In  addition  to  them,  there 
are  introcuced  four  postulates  which  are  specific  for  predicate  calcu¬ 
lus  and  to  which  we  assign  the  numbers  from  12  to  15  inclusive: 

C3P(*)  14.  P(t)D3xP(x). 

V3VxP(x)'  P{X)DC 

13.  VxP(x)DP(().  '  3xP (x)  DC  * 

With  the  aid  of  these  postulates  (axioms)  we  can  perform  the  for¬ 
mal  deduction  of  new  formulas  by  exactly  the  same  method  as  in  the 

e 

case  of  propositional  calculus.  We  note  only  that  the  expressions  P(x) 
and  P(t)  in  axioms  12-15  must  be  understood  not  only  as  elementary 
one-place  predicates,  but  also  as  any  formulas  of  predicate  calculus 
containing  the  letters  x  and  t  as  free  variables.  Here  it  is  not  ex¬ 
cluded  that  other  free  variables  can  appear  in  the  corresponding  form¬ 
ulas.  In  the  formal  axiomatic  construction  of  predicate  calculus  we 
do  not  usually  use  the  symbols  of  the  individual  objects  or  individual 
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predicates,  so  that  the  predicates  appearing  in  the  formulas  are  tawen 
to  be  any,  and  not  fixed,  predicates.  In  place  of  the  propositional 
letters  A,  B,  C  in  the  axioms  1-15  there  can  be  substituted  any  form¬ 
ulas  of  predicate  calculsu,  including  those  which  contain  free  vari¬ 
ables. 

In  all  the  substitutions  which  we  are  discussing  here  it  is  un- 
\  derstood  that  the  free  and  bound  variables,  and  also  the  various  bound 

variables  in  the  formulas  obtained  as  a  result  of  the  substitutions, 
must  be  designated  with  different  letters.  This  condition  pern  its 
avoiding  the  so-called  collision  of  variables,  which  leads  to  unfore¬ 
seen  binding  of  variables  which  must  be  left  free.  Actually,  If,  say, 
in  the  formula  3xP(x)^C  in  place  of  C  we  substitute  the  formula  Q(x), 
then  the  existensional  quantifier  3x  would  bind  not  only  the  variable 
in  the  predicate  P,  but  also  the  variable  in  the  predicate  Q.  Colli¬ 
sion  of  variables  can  always  be  avoided  by  means  of  renaming  of  the 
bound  variables.  Hereafter  In  the  case  of  the  necessity  of  such  re¬ 
naming  we  shall  always  assume  that  it  has  been  accomplished. 

'  Under  the  condition  that  the  necessary  precuat ionary  measures 

are  taken  to  avoid  collision  of  the  variables,  all  the  results  on  the 
deducibility  of  some  formulas  from  others  obtained  previously  in  pro- 
positional  calculus  (see  the  formulas  1-7  in  §5  of  Chapter  2)  are 
transferred  over  to  predicate  calculus.  The  deduction  theorem  (with 
corresponding  stipulations)  also  remains  valid  in  predicate  calculus. 

In  particular,  if  the  formula  B  is  deducible  (in  predicate  calculus) 

from  the  formula  A,  then  the  formula  A  jB  (under  the  condition  of  use 

.  •  • 

of  measures  to  prevent  occurrence  of  collision  of  variables)  will  be 
deducible  in  predicate  calculus. 

The  following  rules  of  deducibility  are  easily  derived  from  axi¬ 
oms  12-15 1 
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Insertion  of  generality  quantifier  (  V  -insertion) 

A(x){-VxA[x)\  1115) 

insertion  of  existensional  quantifier  (  3  -insertion) 

A(()^3xA(x^  (116) 

removal  of  generality  quantifier  ( v  -removal) 

VxA(x)Mjft  (117) 

so-called  < 3  -removal:  if  >| (X)  1- C  »  then  (see  Kleene  [42]) 

r.  3  xA  (x)hC.  (118) 

The  symbol  of  the  variable  x  written  above  the  deducibility  sym¬ 
bol  |-  means  that  the  corresponding  variable  is  altered  (converted 
from  a  free  variable  to  an  apparent  variable)  in  the  process  of  the 
deduction  in  accordance  with  the  axioms  (deduction  rules)  12  and  15. 
The  free  variables  which  are  not  altered  in  the  deduction  process  are 
customarily  termed  fixed  variables.  This  last  concept  can  be  used  for 
the  refinement  of  the  formulation  of  the  deduction  theorem. 

If  there  obtains  the  deducibility  of  r,A  1-  B,  and  in  the  deduction 
process  the  free  variables  occurring  in  the  formula  A  remain  fixed, 
then  there  obtains  the  deducibility  of  t  h  A  D  B  . 

Using  the  equivalency  symbol  ~  in  the  same  sense  as  in  proposi¬ 
tional  calculus,  and  using  A  to  denote  any  formula  not  containing  the 
free  variable  x,  we  can  easily  establish  the  following  relations : 


f—  lr  xA  —  A,  | —3  x A  —  A', 

(119) 

I—  V  xVyP(x,  y)  —  V  y  V  xP (*,  y)\ 

(120) 

| —3  x3yP(x, y)~3y3  xP (x, y); 

(121) 

hVxPW  D3  xP  (x); 

(122) 

\-3  x  V  yP  (x,  y)"DV  y  3  xP  (x,  y). 

(123) 

Rules  (120)  and  (121 )  show  the  possibility  of  variation  of  the 
order  of  application  of  like  quantifiers.  For  the  unlike  quantifiers 
this  situation  does  not  obtain,  since  the  relation  \-Vx3yP(x,y)  o  3yVxP(x,y), 


dual  to  the  relation  (123)#  In  the  general  case  does  not  obtain  in 
predicate  calculus.  To  convince  ourselves  of  this  it  is  sufficient  to 
consider  as  the  object  region  the  set  of  all  natural  numbers,  and  as 
the  predicate  P(x,  y)  the  predicate  which  is  true  if  and  only  if  x  <  y. 
Then  the  formula  Vx3yP{x,y)  expresses  the  proposition  "for  every  natural 
number  there  exists  the  natural  number  ^  which  is  larger  than  x. "  At 
the  same  time  the  formula  3y\/xP(x, y)  is  the  proposition  "there  exists 
a  natural  number  which  is  larger  than  all  the  natural  numbers."  The 
first  prorposition  is  true  and  the  second  is  false.  Therefore  the  pro¬ 
position  vx3yP(x,g)D  d  3yVxP{x,y)  with  the  considered  interpretation  is 
false  and  must  not  be  deducible  in  a  (contensively)  consistent  calcu¬ 
lus. 

We  take  the  contenslve  consistency  of  predicate  calculus  (and 
propositional  calculus  as  well)  in  the  sense  that  only  identically 
true  formulas  can  be  deducible  in  this  calculus,  i.e.,  those  formulas 
which  remain  true  for  any  object  region  and  for  any  concrete  inter-  • 
pretation  of  the  predicates  occurring  in  them.  Without  binding  our¬ 
selves  to  the  requirements  of  constructivity  of  arguments  (i.e.,  re¬ 
maining  in  the  framework  of  the  set-theoretic  approach  to  predicate 
calculus),  it  is  easy  to  see  that  the  formulas  expressed  by  axioms  13 
and  14,  just  as  the  formulas  expressed  by  axioms  1-10  of  propositional 
calculus,  are  identically  true  formulas. 

The  deduction  rules  (11,  12  and  15)  also  lead  to  identically  true 
formulas  under  the  condition  of  identical  truth  of  their  premises.  As 
an  example  let  us  consider  the  deduction  rule  12.  The  identical  truth 
of  the  premise  C  D  P(x)  can  obtain  only  in  two  cases:  either  when  the 
proposition  C  is  false,  or  when  the  proposition  P(x)  is  always  true. 

It  is  evident  that  in  both  of  these  cases  the  truth  of  the  proposition 
C'DVxP(x).  aldo  obtains. 
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By  similar  arguments  the  contensive  consistency  of  predicate 
calculus  is  proved  (although  not  completely  constructively).  The  non- 
structivity  which  is  considered  here  is  associated  with  the  implicitly 
assumed  possibility  of  the  sorting  of  all  the  values  of  the  object 
variables  in  the  determination  of  theidentical  truth,  which  in  the 
case  of  an  infinite  object  region  requires  an  infinite  number  of  steps, 
which  is  not  in  agreement  with  the  requirement  of  finiteness,  manda¬ 
tory  for  the  strictly  constructive  constructions. 

Limiting  ourselves  to  only  the  finite  object  regions,  it  is  easy 
to  give  a  completely  constructive  nature  to  the  proof  of  the  consis¬ 
tency  of  predicate  calculus.  With  this  limitation  the  predicate  cal¬ 
culus  essentially  reduces  to  propositional  calculus,  since  the  quan¬ 
tifier  binding  in  this  case  is  simply  a  short  form  of  writing  of  the 
conjunctions  and  disjunctions  extended  to  all  the  objects  of  the  ob¬ 
ject  region,  and  the  relations  expressed  by  axioms  12-15  are  deducible 
from  axioms  1-11.  Thanks  to  the  possibility  of  such  an  interpretation, 
the  question  on  the  consistency  of  predicate  calculus  reduces  to  the 
corresponding  question  for  propositional  calculus,  which  was  resolved 
earlier. 

A  similar  method  is  used  to  establish  the  formal  consistency  (al¬ 
so  termed  simple  consistency)  of  predicate  calculus,  i.e.,  the  im¬ 
possibility  of  deduction  in  this  calculus  of  any  formula  together  with 
its  negation. 

The  problem  of  contensive  completeness  of  predicate  calculus, 
i.e.,  the  possibility  of  formal  deduction  in  this  calculus  of  any 
identically  true  formula,  was  resolved  in  the  positive  sense  by  Godel 
[16].  It  is  obvious  that  in  the  case  of  infinite  object  regions  the 
establishment  of  the  contensive  completeness  of  predicate  calculus  re¬ 
quires  the  use  of  material  which  goes  beyond  the  limits  of  finite 
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mathematics.  In  the  case  of  finite  object  regions  the  question  on  the 
contensive  completeness  of  predicate  calculus  reduces  to  the  corre¬ 
sponding  question  for  predicate  calculus  and  therefore  is  resolved 
constructively. 

In  contrast  with  contensive  completeness,  also  termed  complete¬ 
ness  in  the  broad  sense,  completeness  in  the  narrow  sense  does  not  ob¬ 
tain  in  predicate  calculus.  Actually,  to  the  list  of  axioms  of  pred¬ 
icate  calculus  there  can  be  adjoined  the  formula  3xP(x )  d  VxP{x),  which 
is  not  deducible  in  this  calculus  and  does  not  lead  to  the  occurence 
of  a  contradiction.  The  consistency  of  the  axiom  system  arising  as 
the  result  of  this  adjunction  becomes  clear  with  consideration  of  the 
object  region  consisting  of  a  single  object.  In  this  case  the  newly 
adjoined  axiom  becomes  an  identically  true  formula.  At  the  same  tine, 
for  the  object  region  which  now  consists  of  the  two  objects  x  and  y, 
this  axiom  is  converted  into  the  formula  p(X)  v  P(y)  3  W»)  A  Piy).  which  is 
not  identically  true  and  therefore  is  not  deducible  from  the  remain¬ 
ing  axioms. 

With  the  set-theoretic  approach  to  the  construction  of  predicate 
calculus  the  following  interesting  theorem  due  to  Mal'tsev  [52]  can 
be  proved. 

Theorem  1,  If  an  infinite  disjunction  of  (finite)  formulas  of  re¬ 
stricted  predicate  calculus  is  an  identically  true  formula,  then  the 
finite  disjunction  of  these  formulas  is  identically  true. 

This  theorem  can  be  used  successfully  for  the  proof  of  the  so- 
called  local  theorems,  which  in  several  cases  make  it  possible  to 
transfer  to  the  infinite  sets  the  properties  which  are  valid  for  all 
their  finite  subsets. 

Along  with  the  identically  true  formulas,  in  predicate  calculus 
it  is  useful  to  consider  the  so-called  satlsflable  formulas.  Satisfi- 
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able  Is  the  term  given  to  a  formula  which  can  be  made  true  with  the 
selection  of  a  suitable  object  region  and  a  proper  definition  of  the 
predicates  given  on  it.  It  is  understood  that  the  formulas  in  question 
here  do  not  contain  symbols  of  the  individual  objects  or  individual 
predicates. 

Every  identically  true  formula  is  moreover  satisfiable,  but  the 
reverse  is  of  course  not  true  in  the  general  case.  An  example  of  a 
satisfiable  but  not  identically  true  formula  might  be  the  formula 
3xP{x)  3  VxPit),  which  was  considered  above.  This  formula  is  identically 
true  only  on  those  object  regions  which  consist  of  one  single  object. 

It  is  clear  that  a  formula  which  is  not  satisfiable  on  some  ob¬ 
ject  region  is  identically  false  on  this  region.  Negations  of  the 
identically  true  formulas.  Thereby  there  is  established  the  connection 
between  the  concepts  of  satisfiability  and  identical  truth  of  the 
formulas  of  predicate  calculus. 

We  can  construct  examples  of  formulas  which  are  not  satisfiable 
on  any  finite  object  regions,  but  which  are  satisfiable  on  Infinite 
object  regions.  Moreover  the  theorem  due  to  Levengeym  [Lowenhelm]  Is 
valid. 

Theorem  2.  If  a  formula  of  predicate  calculus  is  satisfiable  on 
seme  any  infinite  object  region,  then  it  is  also  satisfiable  on  an  en- 
numerable  object  region. 

The  solvability  problem  for  predicate  calculus  consists  in  the 
Indication  of  a  single  effective  technique  (algorithm)  for  the  deter¬ 
mination  of  the  satisfiability  or  nonsatisfiability  of  any  given  form¬ 
ula  of  predicate  calculus  (on  sane  object  region).  In  contrast  with 
propositional  calculus,  where  the  similar  algorithm  was  constructed 
without  any  difficulty,  the  problem  of  solvability  in  the  general  case 
for  predicate  calculus,  as  shown  by  Church  and  Turing,  in  general  has 


no  solution.  In  other  words,  there  does  not  exist  a  single  construc¬ 
tive  technique  for  the  establishment  of  the  satisfiability  or  the  non¬ 
satisfiability  of  any  formula  of  predicate  calculus. 

Quite  frequently  the  problem  of  solvability  for  predicate  calcu¬ 
lus  is  formulated  in  a  somewhat  different  form:  find  the  algorithm  for 
the  determination  of  the  truth  (i.e.,  the  identical  truth)  of  any 
given  formula  in  this  calculus. 

In  view  of  the  contensive  completeness  of  predicate  calculus,  the 
algorithm  which  differentiates  the  true  formulas  of  -the  calculus  from 
the  false  simultaneously  solves  the  problem  of  the  differentiation  of 
the  provable  and  unprovable  formulas  of  this  calculus.  We  note  also 
that  from  the  truth  of  any  formula  there  follows  the  nonsatisfiability 
of  its  negation.  Therefore,  if  we  could  decide  the  question  on  the  sat¬ 
isfiability  or  nonsatisfiability  of  all  the  formulas  of  predicate  cal¬ 
culus  we  would  have  the  possibility  of  also  resolving  the  question  on 
the  truth  of  any  formula.  Unfortunately,  in  the  general  case  neither 
the  first  nor  second  questions  have  solutions. 

Thus,  with  respect  to  the  problem  of  solvability  the  predicate 
calculus  differs  basically  from  propositional  calculus.  However,  if  we 
limit  ourselves  to  certain  particular  forms  of  the  formulas  the  deci¬ 
sion  algorithm  can  be  constructed  in  the  case  of  predicate  calculus  as 
well. 

Such  an  algorithm  can,  for  example,  be  constructed  for  the  form¬ 
ulas  of  predicate  calculus  which  contain  only  single-place  predicates. 
This  situation  is  the  simple  result  of  the  fact  that  for  the  establish¬ 
ment  of  the  satisfiability  or  nonsatisfiability  of  a  formula  contain¬ 
ing  n  single-place  predicates  it  is  sufficient  to  limit  ourselves  to 
the  consideration  of  the  object  regions  consisting  of  no  more  than  2n 
objects.  As  a  result  the  verification  of  the  satisfiability  (or  iden- 
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tical  truth)  of  the  predicate  formula  reduces  (after  replacement  of  the 
quantifiers  by  disjunctions  and  conjunctions)  to  the  verification  of 
the  satisfiability  (or,  correspondingly,  the  identical  truth)  of  the 
corresponding  formula  of  propositional  calculus. 

In  the  general  case,  in  the  resolution  of  the  question  on  the 
satisfiability  or  nonsatisfiability  of  any  specific  formula  it  may  be 
of  considerable  assistance  to  perform  a  preliminary  reduction  of  this 
formula  to  the  so-called  normal  form.  We  shall  differentiate  two  forms 
of  normal  forms :  the  so-called  prenex  form  and  the  Skolem  normal  form. 
The  prenex  form  is  characterized  by  the  fact  that  all  the  quantifiers 
(if  there  are  any  must  be  located  at  the  very  beginning  of  the  formula 
and  the  action  region  of  each  of  them  must  extend  to  the  end  of  the 
formula.  In  the  Skolem  normal  form  it  is  additionally  required  that  all 
the  extensional  quantifiers  precede  all  the  generality  quantifiers. 

If  the  formula  Is  written  in  the  prenex  form,  then  the  portion 
standing  after  the  quantifier  (the  quantifier-free  portion  of  the  . 
formula)  can  be  considered  as  a  formula  of  propositional  calculus 
(each  predicate  Is  considered  here  simply  as  a  variable  proposition). 
But  then  we  can  exclude  all  the  implication  signs  in  this  formula 
(replacing  A  3  B  by  —  v  and  then  reduce  it  to  the  disjunctive 
normal  form.  A  similar  transformation  reduces  the  original  predicate 
formula  to  seme  predicate  formula  equivalent  to  it.  In  many  cases  the 
concept  of  the  normal  form  of  the  predicate  formula  includes  not  only 
the  condition  of  prenexing  of  the  quantifiers,  but  also  the  mandatory 
reduction  of  its  quant If ire-free  portion  to  the  ideal  disjunctive  nor¬ 
mal  form. 

The  following  theorem  is  valid. 

Theorem  3.  For  every  formula  A  of  (restricted)  predicate  calculus 
■  1  • 

there  exists  Its  equivalent  formula  B  written,  prenex  form.  There  ex- 


ists  a  single  constructive  technique  (algorithm)  for  reducing  any 
(predicate)  formula  to  the  prenex  form. 

The  validity  of  the  formulated  theorem  follows  from  the  easily 


verifiable  relations 

I — \VxP(x)~  3x—,P(x)\  (124) 

h  -n3xP(x)~Vx~,P(xy.  (125) 

h  Q  A  V  xP  (x)  -  V  x  (Q  A  P  (x));  ( 126  ) 

I-  Q  A  3  xP  (x)  ~  3  x  \Q  A  P  (x));  (127) 

h  Q  V  V  xP  (x)  -  V  x (Q  V  P(x));  (128) 

QV 3 xP(x) ~3 x{QV P(x)).  (129) 


Since  implication  can  be  replaced  by  the  operations  of  disjunction 
and  negation,  with  the  aid  of  the  aid  of  the  above  formulas  with  ob¬ 
servation  of  the  conditions  which  exclude  the  possibility  of  the  oc¬ 
currence  of  collision  of  thevariables,  we  can  perform  the  sequential 
permutation  of  the  quantifiers  with  all  the  symbols  (different  from 
the  quantifiers)  which  make  up  the  formula  until  all  the  quantifiers 
appear  in  the  left  part  of  the  formula.  For  example,  the  formula 
P(x)  V-iVyQ(x.y)  can  be  first  transformed  to  its  equivalent  formula 
P(x)\/3y  -i  Q\x,y),  and  then  to  the  (also  equivalent)  formula  3y  (P  (x)V 
v-.  Q(x,y)),  which  then  is  the  required  prenex  form  of  the  original  form¬ 
ula. 

A  direct  analogy  of  theorem  3  does  not  exist  for  the  Skolem  nor¬ 
mal  form:  not  every  formula  of  predicate  calculus  has  an  equivalent 
formula  having  the  Skolem  normal  form. 

However  the  concept  of  equivalence  can  be  generalized  so  that  any 
formula  of  predicate  calculus  can  be  reduced  to  the  Skolem  normal 
form.  This  generalization  os  given  by  the  concept  of  the  so-called  de¬ 
ductive  equivalence  [6lJ. 


The  formula  A  is  termed  deductively  equivalent  to  the  formula  B 
•  • 

if  by  adjoining  formula  A  to  the  axiom  set  of  the  calculus  we  obtain 


the  possibility  of  deducing  the  formula  B  from  the  thus  expanded  sys- 
tem  of  axioms.,  and,  on  the  other  hand,  by  adjoining  formula  B  to  the 
axiom  set  we  obtain  the  possibility  of  deducing  formula  A 

This  definition  is  applicabel  not  only  to  predicate  calculus,  but 
also  to  any  other  logical  calculus,  in  particular  to  propositional 
calculus.  Since  in  propositional  calculus  the  adjoining  of  any  nonde- 
ducible  formula  to  the  axiom  set  makes  all  the  formulas  deducible, 
then  any  two  nondeducible  formulas  of  predicate  calculus  are  deduc¬ 
tively  equivalent.  It  is  also  clear  that  any  deducible  formulas  (in 
any  calculus)  are  deductively  equivalent.  At  the  same  time,  deductive 
equivalence  of  the  deducible  and  nondeducible  formulas  is  Impossible, 
since  the  adjoining  of  the  firs  formula  to  the  axiom  system  does  not 
make  the  second  formula  deducible. 

Thus,  in  propositional  calculus  both  all  deducible  and  all  non¬ 
deducible  formulas  are  deductively  equivalent.  It  Is  also  easy  to  see 
that  In  predicate  calculus  (as,  moreover,  in  propositional  calculus) 
conventional  equivalence  of  fromulas  implies  their  deductive  equiva¬ 
lence.  However,  the  reverse  Is  not  true  In  general,  since,  for  example 
two  elementary  propositional  variables  (arbitrary  letters)  P  and  Q, 
which  are  not  equivalent  to  one  another,  are  however  deductively 
equivalent . 

The  following  theorem  due  to  Skolem  is  valid. 

Theorem  4.  For  every  formula  of  (restricted)  predicate  calculus 
there  exists  its  deductively  equivalent  formula  written  in  the  Skolem 
normal  form.  There  exists  a  single  constructive  technique  (algorithm) 
which  permits  performing  the  reduction  of  any  predicate  formula  to  Its 
deductively  equivalent  Skolem  form. 

It  can  be  shown  that  If  two  formulas  are  deductively  equivalent, 
then  the  identical  truth  of  one  of  them  implies  the  identical  truth  of 


the  other.  Since  there  exists  a  technique  for  the  reduction  of  any 
formula  of  restricted  predicate  calculus  to  its  deductively  equiva¬ 
lent  formula  in  the  Skolem  normal  form,  then  with  the  resolution 
of  the  problem  on  the  establishment  of  theidentical  truth  of  partic¬ 
ular  formulas  we  can  replace  these  formulas  by  their  corresponding 
Skolem  normal  forms.  This  situation  can  also  be  used  for  the  proof  of 
the  contensive  completeness  of  predicate  calculus,  since  for  that 
proof  it  is  sufficient,  in  view  of  what  has  been  said  above,  to  estab¬ 
lish  the  deducibility  of  all  the  identically  true  formulas  written  in 
the  Skolem  normal  form.  Actually,  by  establishing  the  deducibility  of 
all  the  indicated  formulas  we  thereby  establish  the  deducibility  of 
all  their  deductively  equivalent  formulas,  i.e.,  all  the  identically 
true  formulas  of  predicate  calculus. 

If  a  formula  of  predicate  calculus  contains  free  variables,  it  is 
termed  an  open  formula.  Formulas  hich  do  not  contain  free  variables 
are  customarily  termed  closed  formulas.  If  x^,  x2,  ...,  xn  are  all 
the  free  variables  of  the  open  formula  A,  then  the  closed  formula 
Vx,Vx,..Vxk'H  is  termed  the  closure  of  formula  A.  Any  formula  B  is  de- 
ductively  equivalent  to  its  closure  B1  and  therefore  these  two  form- 
ulas  are  either  simultaneously  identically  true,  or  are  simultaneously 
not  identically  true. 

If  the  problem  of  solvability  is  taken  in  the  sense  of  finding 
the  algorithm  which  differentiates  the  true  formulas  of  predicate  cal¬ 
culus  from  the  false,  then  the  procedure  of  closure  of  the  formulas. 
Just  as  the  procedure  of  reducing  the  formulas  to  the  normal  form 
(prenex  or  Skolem)  performs  the  reduction  of  the  general  problem  of 
solvability  to  the  corresponding  problem  for  the  formulas  of  some 
special  form.  Of  course,  this  reduction  does  not  aid  the  solution  of 
the  problem  of  the  solvability  for  all  formulas  of  restricted  predi- 
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cate  calculus.  We  can,  however,  Identify  several  quite  broad  classes 
of  formulas  for  which  decision  procedures  exist.  One  clasB  of  thiB 
kind  (formulas  containing  only  single-place  predicates)  was  considered 
above. 

The  decidability  problem  has  a  positive  solution  for  the  case  of 
closed  formulas  written  in  the  prenex  form  with  either  only  generality 
quantifiers  (A-formulas)  or  with  only  existensional  quantifiers  (E- 
formulas).  If  we  denote  the  number  of  these  quantifiers  by  m,  then 
the  following  theorem  is  valid  [1], 

Theorem  5.  For  the  close  A-formulas  with  m  quantifiers  truth  need 
be  established  only  for  the  object  regions  which  contain  no  more  than 
m  objects.  If  such  of  formula  is  true  in  the  region  consisting  of  m 
objects  then  it  is  an  identically  true  formula. 

Theorem  6.  A  close  E-formula  is  identically  true  if  it  is  true  in 
the  object  region  containing  only  one  single  object.  If  it  is  ture  in 
some  region,  then  it  is  true  also  in  any  other  region  with  a  larger 
number  of  objects. 

The  decision  procedures  for  the  closed  A-formulas  and  E-formulas 
result  directly  from  these  theorems :  just  as  in  the  case  of  formulas 
with  single-place  quantifiers,  the  finiteness  of  the  object  region 
permits  reduction  of  the  question  on  the  truth  of  the  predicate  form¬ 
ulas  to  the  question  on  the  truth  of  the  corresponding  formulas  of 
propositional  calculus. 

The  decision  problem  has  a  positive  solution  also  for  all  AE-form- 
ulas,  i.e.,  for  those  closed  formulas  of  restricted  predicate  calculus 
in  whose  prenex  normal  form  all  the  generality  quantifiers  precede  all 
the  existensional  quantifiers  (in  the  Skolem  normal  form  the  order  of 
the  quantifiers  is  reversed).  All  close  AEA-formulas  are  decidable  in 
which  the  number  of  existensional  quantifiers  does  not  exceed  two. 


We  note  that  In  all  cases  which  we  have  considered  here  the  de¬ 
cidability  was  understood  in  the  sense  of  establishing  the  truth  or 
falsity  of  the  formulas.  With  transition  to  the  concept  of  decidabil¬ 
ity  in  the  sense  of  establishing  the  satisfiability  or  nonsatisfiabil¬ 
ity  of  the  formulas,  decidable  classes  of  formulas  are  obtained  from 
the  classes  (decidable  in  the  first  sense)  of  formulas  listed  above  by 
the  replacement  of  all  the  existensional  quantifiers  by  generality 
quantifiers  and  vice  versa.  Thus,  for  example,  the  class  of  all  EA- 
formulas  and  the  class  of  all  EAE-formulas  containing  no  more  than  two 
generality  quantifiers  will  be  decidable  (in  the  sense  of  establishing 
the  satisfiability  or  nonsatisfiability). 

A  large  number  of  classes  of  formulas  for  which  the  decision  pro¬ 
blem  is  resolved  positively  has  now  been  established.  The  limitations 
used  to  identify  the  indicated  classes  concern  not  only  the  nature, 
number  and  order  of  arrangement  of  the  quantifiers,  but  also  the  form 
of  the  quantifier-free  parts  of  the  formulas  (written  in  the  prenex 
normal  form). 

The  possibilities  have  also  been  investigated  of  the  construction 
of  decision  procedures  beyond  the  limits  of  the  restricted  predicate 
calculus,  in  particular  the  procedure  for  the  resolution  of  certain 
formulas  of  second  degree  predicate  calculus.  In  the  second  degree 
predicate  calculus  use  is  made  not  only  of  object  quantifiers,  but  al¬ 
so  of  predicate  quantifiers  ("for  any  predicate  P,"  "there  exists  the 
predicate  P"),  however  the  predicates  can  depend  only  on  the  object 
variables  and  cannot  be  included  in  the  system  of  objects  composing 
the  object  region. 

Second  degree  predicate  calculus  in  the  general  form  not  only  is 
not  decidable,  but  also  (as  shown  by  Godel)  cannot  have  any  complete 
axiom  system.  Nevertheless,  even  in  this  calculus  there  exist  quite 
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broad  decidable  parts.  Such  a  part,  for  example,  is  the  so-called  AND- 


calculus.  In  this  calculus  the  object  region  is  the  set  of  all  natural 
numbers,  and  all  the  predicates  are  single-place.  A  corresponding  re¬ 
sult  was  announced  by  Byukh,  who  somewhat  earlier  constructed  a  deci¬ 
sion  algorithm  for  a  weakened  variant  of  the  AND-calculus  which  has 
found  application  in  the  theory  of  finite  automata. 

§2.  FORMAL  ARITHMETIC  AND  THE  OODEL  THEOREM 

A  formal  arithemtic  can  be  constructed  on  the  base  of  the  (re¬ 
stricted)  predicate  calculus.  The  objects  used  for  the  construction 
of  the  formal  arithmetic  are  the  whole  nonnegative  numbers  0,1, 2, 3,...  . 
On  the  set  of  all  such  numbers  there  are  determined  the  conventional 
arithmetic  operations  of  addition  and  multiplication,  and  also  the 
operation  of  direct  succession  a1  =  a  +  1  (a  =  0, 1,2, . . . ) .  This  opera¬ 
tion  gives  a  method  for  unique  representation  of  all  the  natural  num¬ 
bers:  1  a  O',  2  «=*  r  —  O',  3  =*2'  *0"  etc. 

Arithmetic  expressions  which  are  customarily  termed  measures  are 
composed  with  the  aid  of  these  operations  from  the  whole  nonnegative 
numbers  and  variables  which  run  through  the  whole  nonnegative  values. 
Examples  of  such  terms  are  the  expressions  x.  O’,  x«y*  +  a" *z.  We  note 
that  in  the  case  of  absence  of  brackets  to  determine  a  particular  or¬ 
der  of  operations,  the  direct  succession  operation  has  the  right  of 
priority.  After  It  follows  the  multiplication  operation  and  then  addi¬ 
tion.  Thus,  for  example,  the  expression  x»y'  +  z’  must  be  understood 
aB  ((x)*(y!))  +  (z')>  and  not  as  anything  else. 

By  combining  two  terms  with  an  equality  sing,  we  obtain  a  propo¬ 
sition  which  is  true  or  false  depending  on  whether  the  indicated  equal¬ 
ity  is  true  or  false.  All  such  propositions  constitute  the  set  of  so- 
called  elementary  formulas  of  formal  arithmetic.  If  in  the  terms  com¬ 
posing  the  proposition  there  are  variables,  then  it  (this  proposition) 


will  be  a  predicate  which  is  naturally  termed  an  elementary  arithmetic 
predicate.  Prom  such  elementary  predicates  with  the  aid  of  the  opera¬ 
tions  of  (restricted)  predicate  calculus  (including  the  operation  of 
quantifier  binding)  there  are  constructed  more  complex  arithmetic 
predicates.  All  the  predicates  which  can  be  consteucted  in  this  way 
are  termed  Qodel  arlthtmetic  predicates. 

The  formulas  of  the  formal  arithmeitc  system  which  we  have  con¬ 
structed  are  limited  to  the  formulas  which  can  be  constructed  from  the 
elementary  formulas  with  the  aid  of  the  operations  of  (restricted) 
predicate  calculus.  The  axiom  system  of  the  formal  arithmetic  is  ob¬ 
tained  by  supplementing  the  axiom  system  of  (restricted)  predicate  cal- 
culsu  (axioms  1-15)  by  the  specific  arithmetic  axioms: 

16.  P(0) /\V x((P(x)^P(x'))3P(x))  (axiom  of  mathematical  induction). 

*  • 

17.  a'  -  b’  Da  -  b. 

18.  “a'«0. 

19.  a  =  b  D(a  *  c  Db  —  c ). 

20..a  =  bD(a‘  =6'). 

21.  a  +  0  «=  a. 

22.  a  +  b'  =*  (a  +  b)\ 

23.  a- 0-0. 

24.  a-6'  +  a. 

For  the  proper  understanding  of  these  relations  it  is  necessary 
to  note  that  in  order  to  economize  brackets  a  definite  order  of  prior¬ 
ity  of  operations  is  established  in  the  formulas  of  the  formal  arith¬ 
metic  system  which  we  have  constructed:  all  arithmetic  operations  (di¬ 
rect  succession,  multiplication  and  addition)  have  priority  over 
equality,  and  the  latter  has  priority  over  all  the  logical  operations. 

Having  the  system  of  axioms  (including  the  deduction  rules  11,  12 
and  15)  we  can  transfer  to  the  formal  arithmetic  the  concept  of  (for¬ 
mal)  demonstrabillty  (deducibility)  and  nondeducibility)  of  the  form¬ 
ulas,  and  also  the  concepts  of  formal  deduction,  identically  true  and 
identically  false  formulas,  etc. 
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With  the  aid  of  these  axioms  we  can  in  a  rigorously  formal  manner 
establish  also  the  laws  of  the  arithmetic,  such  as  the  commutative  and 
associative  laws  for  addition  and  multiplication,  the  distributive  law 
for  multiplication  with  respect  to  addition,  etc.  We  can  prove  the 
validity  of  the  relations  I—  a  4- 1  —  a',  •  a-0'  =  a,  f-a+6=0Da=0A&- 

=  0,  i-a-6  =»o  30=071  6—0  and  others. 

Using  axiom  16  we  can  derive  the  following  general  rule  for  proof 
by  the  method  of  induction. 

Let  T  be  a  set  of  formulas  of  (formal)  arithmetic  which  do  not 
contain  the  variables  x  as  a  free  variable,  and  let  P(x)  be  a  formula 
in  which  the  variable  x  occurs  f^ee.  Then,  if  r  I —  P(0)  and  r,P(x)  P{x’y 
(without  alteration  of  the  free  variables  in  P(x)),  then  n- />(*)  . 

Continuing  the  arguments  in  this  fashion,  we  can  in  a  rigorously 
formal  fashion  prove  all  the  basic  theormes  and  justify  all  the  basic 
proof  techniques  used  in  the  elementary  arithmetic  constructed  by  the 
contensive  method. 

The  resolution  of  such  basic  questions  as  the  consistency  and  c 
completeness  of  the  axiom  system  in  the  case  of  the  formal  arithmetic 
is  much  more  complex  than  in  restricted  predicate  calculus.  Thus,  for 
the  proof  of  the  consistency  it  is  necessary  to  go  beyond  the  frame¬ 
work  of  the  strictly  finite  methods.  In  general,  completeness  does  not 
obtain  for  the  formal  arithmetic  system  which  we  have  constructed. 
Moreover,  incompleteness  is  retained  for  any  consistent  extension  of 
this  system  obtained  as  a  result  of  supplementing  the  axiom  system  we 
have  written  out  with  any  finite  number  of  compatible  (i.e.,  not  lead¬ 
ing  to  a  contradiction)  new  axioms.  This  is  the  sense  of  the  celer- 
brated  Godel  theorem  on  the  incompleteness  of  the  arithmetic,  which 
forced  a  new  look  at  the  entire  problem  of  the  substantiation  of  math¬ 
ematics  and  automatization  (on  the  base  of  complete  formalization)  of 


the  process  of  the  deduction  of  new  theorems  in  deductively  constructed 
theories. 

In  order  to  clarify  the  basic  idea  of  the  proof  of  the  theorem  on 
the  incompleteness  of  the  arithmetic,  it  is  necessary  to  make  a  pre¬ 
liminary  acquaintance  with  several  concepts  and  auxiliary  results. 

First  of  all  we  must  formally  define  the  very  concept  of  completeness. 

To  establish  the  class  of  deducible  (demonstrable)  formulas  of 
the  arithmetic  it  is  sufficient  to  limit  ourselves  to  the  considera¬ 
tion  of  only  the  closed  formulas.  Actually,  as  a  result  of  the  easily 

verifiable  relations  P(xu x, x„)  VxlVxt...VxnP(xl,x .  xn),vXl  Vxt... 

...  VxnP( xltx, . xn) P{xltxt . xn)  of  predicate  calculus,  from  the  demonstra- 

bility  of  some  formula  there  follows  the  demonstrability  of  its  clo¬ 
sure  and  vice  versa. 

A  formal  arithmetic  system  is  termed  (simply)  complete  if  every 

closed  formula  A  is  formally  decidable,  i.e.,  if  one  of  the  formulas 

A  or  “|  A  is  a  decidable  formula. 

•  • 

If  for  the  formula  A  its  negation  “|  A  is  demonstrable,  then  the 

•  • 

formula  A  itself  is  termed  (formally)  refutable.  Formal  decidability 
of  a  closed  formula  thus  means  that  this  formula  is  either  demonstra¬ 
ble  or  refutable.  Since  from  the  naive  contensional  point  of  view  the 
closed  form  must  be  either  true  or  false,  the  condition  of  its  formal 
decidability  is  a  very  natural  criterion  for  the  resolution  of  the 
question  on  the  completeness  of  the  corresponding  formal  system.  The 
basic  idea  of  the  proof  of  the  theorem  on  the  completeness  of  the 
arithmetic  consists  precisely  in  the  actual  construction  of  the  for¬ 
mally  undecidable  closed  form,  for  this  construction  we  need  several 
auxiliary  results,  which  we  shall  now  consider. 

In  the  contensive  sense  an  arithmetic  predicate  is  any  (not 


necessarily  constructively  defined)  predicate  on  the  set  of  all  whole 
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nonnegative  numbers.  The  formal  arithmetic  system  which  we  have  con¬ 
structed  gives  a  method  for  the  constructive  specification  of  some 
arithmetic  predicates.  To  establish  this  method  let  us  introduce  the 
following  definition. 

The  arithmetic  predicate  Pfx-^Xg, . . .  ,xR)  is  termed  numerically 
expressible  if  there  exists  the  formula  P-^x^Xg, . . .  ,xn)  of  the  formal 
arithmetic  system  which  we  are  considering,  not  containing  any  free 
variables  other  than  x1#Xg, . . .  ,xn)  and  such  that  for  any  concrete  set 
of  n  whole  nonnegative  numbers  a^,  a2,...,an  there  are  satisfied  the 
following  conditions : 

1)  if  the  proposition  Pfa^ag, . . .  ,an)  is  true,  then  h  . <*«); 

2)  if  the  proposition  P(a1,a2, . . .  ,aR)  is  false,  then  —  P:{auat, . 

In  this  case  we  say  that  the  formula  P1(x1,x2, . . .  ,xn)  numerically 

expresses  the  predicate  P(x^,Xg, . . . ,xn). 

It  Is  easy  to  show  that  the  arithmetic  predicates  x  =  y  and  x  <  y 
are  numerically  expressed  by  the  formulas  x  =  y  and  sW  +  je=y>  re¬ 
spectively. 

The  formula  P1(x1,Xg, . . . ,xn)  which  we  considered  in  the  example 
Just  presented  Is  decidable  for  any  concrete  set  of  values  of  Its  free 
variables  x1#x2, . . . ,xn.  The  formulas  having  this  property  are  custom¬ 
arily  termed  numerically  decidable  formulas.  The  verification  of  the 
truth  of  the  predicate  numerically  expressible  by  such  a  formula,  with 
any  set  of  values  of  the  object  variables,  can  be  carried  out,  in 
light  of  what  has  been  said,  by  the  constructive  method.  In  the  formal 
arithmetic  system  which  we  have  constructed  each  formula  without  var¬ 
iables  Is  decidable,  and  each  formula  without  quantifiers  is  a  numer¬ 
ically  decidable  formula. 

The  concept  of  numerical  expressibility  can  be  established  not 
only  for  the  arithmetic  predicates,  but  also  for  the  (contensively  de- 


fined)  arithmetic  functions  (whose  values  are  the  whole  nonnegative 
numbers).  We  say  that  the  formula  Pfx^Xg, . . .  ,xn,y )  of  a  formal  arith¬ 
metic  system  numerically  represents  the  arithmetic  function  /(*„*, . xn), 

if  for  any  set  of  n  whole  nonnegative  numbers  the  following  conditions 
are  satisfied: 

1)  it  f  (a,  a, . on)  —  b.  then  t-P(a,.at . an,  6); 

2)  f—  3  y  (P  (at.  ot . a„,  y)  AV  z  ( P  (a„  a, . an,  z)  Dz  =  y)). 

% 

The  second  of  these  conditions  is  the  conditions  is  the  condition, 
expressed  in  predicate  calculus  language,  of  the  uniqueness  of  the 
specification  of  the  function  f  with  the  aid  of  the  predicate  P. 

We  note  that  the  possibility  of  effective  specification  of  the 
predicates  is  not  necessarily  associated  with  the  use  of  the  apparatus 
of  formal  arithmetic.  Any  n-place  arithmetic  predicate  Pfc^Xg, .  . .  xn) 
can  be  specified  with  the  aid  of  the  n-place  arithmetic  function 
<p(x^,x2, . . . ,xn)  which  takes  the  value  0  on  all  sets  of  values  of  the 
variables  x^Xg, . . .  ,xn  on  which  the  predicate  0[sic]  is  true,  and  the 
value  1  on  all  those  sets  on  which  the  predicate  P  is  false.  This  func¬ 
tion  is  termed  the  representative  function  of  the  considered  predicate 
P.  A  predicate  whose  representative  function  is  primitive  recursive 
or  general  revursive  is  termed  respectively  a  primitive  recursive  or 
general  recursive  predicate. 

Godel  has  established  the  following  result. 

Theorem  1.  If  the  arithmetic  function  <p(x1,x2, . . .  ,xR)  is  primi¬ 
tive  recursive,  then  the  (n  +  l)-place  predicate  cpfx^Xg, . . .  ,xn)  =  y 
is  Godel  arithmetic,  i.e.,  expressible  by  means  of  formal  arithmetic. 

From  this  theorem  it  follows,  in  particular,  that  all  primitive 
recursive  predicates  are  Godel  arithmetic  predicates. 

With  the  aid  of  theorem  1  we  can  easily  establish  the  validity  of 
the  following  proposition. 
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Theorem  2.  Every  primitive  recursive  predicate  is  numerically  ex¬ 


pressible  in  formal  arithmetic. 

The  following  result  has  been  established  relative  to  the  general 
recursive  predicates  of  Kleene  [42]  and  Post. 

Theorem  3.  With  any  n  >  0  every  general  recursive  predicate 

P(x1,x2, . . .  ,xn)  can  be  represented  both  in  the  form  3yR(xx,xt . xn,y ) 

and  in  the  form  VyS(x„x, . x„y],  where  the  predicates  R  and  S  are  primi¬ 

tive  recursive.  And,  on  the  other  hand,  every  predicate  which  is  re¬ 
presentable  in  each  of  these  two  forms  is  general  recursive,  and  it 
remains  general  recursive  also  in  the  case  when  the  predicates  R  and  S 
are  not  primitive  recursive,  but  only  general  recursive. 

The  following  theorem  due  to  Kleene  [42]  is  also  valid. 

Theorem  4.  All  general  recursive  predicates  are  Goldel  arithmetic. 

For  various  sorts  of  complex  constructions  and  proofs  in  formal 
arithmetic  it  is  advisable  to  introduce  a  special  numbering  for  all 
its  formulas  and  proofs  of  these  formulas  (apparatus  of  the  formal 
arithmetic  system  which  we  have  constructed).  Such  a  numeration  was 
proposed  by  Godel  and  therefore  is  termed  Qodel  numeration.  There  are 
many  different  ways  of  accomplishing  this.  Let  us  consider  one  of  them. 

Before  defining  the  numbering  of  the  formulas  it  is  advisable  to 
someqhat  alter  the  method  of  their  writing,  considering  not  only  the 
formulas  themselves,  but  also  their  individual  parts  as  formal  objects, 
termed  entities.  Among  the  elementary  entities  there  are,  first,  all 
the  logical  symbols  (D,  A .  V*-"'*  3)  ,  the  equality  symbol  (=),  the 

symbols  for  the  arithmetic  operations  (+,  •,  *)#  the  zero  symbol  (0) 
and  the  two  symbols  for  the  designation  of  the  different  object  vari¬ 
ables  (x,  |).  To  the  different  object  variables  x,y,x,...  there  are 
associated  the  entities  x,  (|,x),  (|,(|,x)),  etc.  To  the  terms  and 
formulas  of  the  form  r+t,  r' ,  r=s,  AVB,  A/\B,~A,VuA{u),  3uA(u)  there  are  as- 
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sociated  the  respective  entities  i+  ,r,»),  (,/•),(=.  ,r,s),  (v,A.&),  i/\.A,B),  (  A), 
(V,u,A(u)),  ( 3,u,A(u ))  .  For  simplicity  of  notations  the  symbols  r,s,A,B,u,A(u) 
and  the  entities  corresponding  ot  them  are  not  differentiated  here. 
Using  these  definitions,  we  can  sequentially,  step  by  step,  construct 
the  entities  corresponding  to  the  various  formulas  and  their  com¬ 
ponent  parts.  For  example,  the  formula  y*(0'  +  x  =  y)  corresponding  to 
the  entity  (VM-,  (+(',0),x),(|  ,*)))  t  the  term  O'  +  x  corresponds  to  the 
entity  (+,  (*,0),x),  the  formula  x  =  y  corresponds  to  the  entity  (^,x,(|, *)) 
etc. 

Now  let  us  assign  to  the  elementary  entities  various  odd  numbers 
(the  Godel  numbers  of  these  entities) 


3AVin«  +  -  '  o  *  i . 

3  5  7  9  11  1315  17  1921232527 

Designating  by  PQ,Pn,P2> • • •  the  sequence  of  all  prime  numbers 

(pQ  =  2,  p.^  «  3,  Pg  b  5,,,,  and  etc.)  and  assuming  that  the  entities 

aQ,a1, . . .  ,a^  are  already  associated  with  the  Godel  numbers  n0,nlf...,  nm, 

let  us  associate  the  entity  (aQ,a1, . . . ,a^)  with  the  Godel  number 

Po'Pi1  •  Associating  with  each  formula  of  the  formal  arithmetic 

system  which  we  are  considering  the  Godel  number  of  its  corresponding 

entity,  we  obtain  the  sought  Godel  numeration  for  all  these  formula 

x  =  y,  to  which  there  corresponds  the  entity  (=,  x,  (|,x)),  has  the 

1*5  2*5  2^ 

Godel  number  2  -/«3  *5  *  to  the  term  0*  +  x  with  its  correspond¬ 

ing  entity  (+,(', 0),x)  there  must  be  assigned  the  Godel  number 
217.3.221-323.525>  etCi 


It  is  easy  to  see  that  the  Godel  number  of  every  nonelementary 
entity  is  necessarily  even.  Keeping  this  fact  in  mind,  and  also  the 
uniqueness  of  the  expansion  of  every  natural  number  into  prime  factors, 
it  is  not  difficult  to  see  that  from  the  Godel  number  we  can  uniquely 
recover  its  corresponding  foimula.  This  means  that  the  correspondence 


between  the  formulas  of  the  formal  arithmetic  system  which  we  are  con¬ 
sidering  and  their  Qodel  numbers  is  one-to-one.  Thus,  in  the  case  of 
necessity  we  can  make  use  of  only  their  Godel  numbers  rather  than  the 
formulas  of  this  system. 

By  analogy  with  the  Godel  numeration  of  the  formulas  of  the 
arithmetic,  we  can  introduce  the  Godel  numeration  for  all  possible  fi¬ 
nite  sequences  of  such  formulas,  among  which  there  we  will  be,  in 
particular,  theproofs  of  all  the  demonstrable  arithmetic  formulas. 

For  any  whole  nonnegative  number  a,  understood  contensively,  we 
shall  use  £  to  designate  the  representation  of  this  number  in  the  for¬ 
mal  arithmetic  (£  represents  the  symbol  0  with  a  primes).  Fixing  some 
Godel  numeration  of  the  arithmetic  formulas,  we  shall  for  any  Godel 
number  n  use  Pn  to  denote  that  formula  which  has  the  number  n  in  our 
numeration.  We  identify  in  the  formula  Pn  the  variable  x  (on  which  the 
formula  actually  may  not  depend),  writing  the  formula  Pn(x). 

Let  us  now  define  the  two  arithmetic  predicates  A(a,  b)  and 
B(a,  b),  considering  the  first  predicate  to  be  true  if  and  only  if  the 
number  a  is  a  Godel  number  of  the  formula  Pfi(x)  such  that  the  formula 
P_(£)  is  demonstrable,  and  the  number  b  is  the  Godel  number  of  some 

a  “ ™ 

proof  of  it. 

Similarly  the  predicate  B(a,b)  is  considered  true  if  and  only  if 
the  number  a  is  a  Godel  number  of  the  formula  P&(x)  for  which  the 
fromula  P_(£)  is  refutable,  and  the  number  b  is  a  Godel  number  of  the 

ci  — 

proof  of  the  formula  ~]  P&(£). 

Using  the  theorems  formulated  above,  we  can  prove  the  validity  of 
the  following  important  lemma. 

Lemma.  The  arithmetic  predicates  A(a,  b)  and  B(a,  b)  which  we 
have  defined  in  the  case  of  the  Godel  numeration  fixed  above  are  nu¬ 
merically  expressible  in  the  formal  arithmetic  system  with  the  axioms 


1-24. 


Let  us  construct  the  formulas  A^(a,  b)  and  B^(a,  b)  which  numer¬ 
ically  express  the  predicates  A(a,  b)  and  B(a,b)  respectively,  and  let 
us  consider  the  formula  Vy~\Ax(x,y).  .  This  formulas  has  the  Godel  num¬ 
ber  £  and  therefore  coincides  with  the  formula  which  we  agreed  to  de¬ 
note  by  P  (x).  Now  let  us  consider  the  formula  P  (£)  which  does  not 

r  r 

have  free  variables.  This  formula,  in  explicit  form  represented  as 
Vw~iA,(£.  yi  ,  can  be  considered  as  a  proposition  expressing  its  intrin¬ 
sic  nondemonstrability.  Actually,  this  proposition  is  the  statement 
that  no  number  can  be  a  Godel  number  of  the  proof  of  the  formula  which 
is  obtained  from  formula  P  (x)  as  a  result  of  the  replacement  of  x  by 
jd.  But  this  replacement  is  just  what  transforms  the  formula  P  (x)  in- 
to  the  formula  yy— \A ,(£, y)  which  we  have  constructed. 

It  is  found  that  with  seme  additional  assumptions  these  proper¬ 
ties  of  the  formula  Vy~A^,y)  imply  its  undecidability,  which  then 
proves  the  incompleteness  of  the  formal  arithmetic  system  which  we 
have  consteucted.  The  additional  assumptions  involved  here  amount  to 
the  fact  that  the  formal  arithmetic  is  assumed  to  be  co-consistent. 

By  o>-cons latency  of  the  formal  arithmetic  system  we  mean  the 
following  property:  for  no  formula  P(x)  for  which  the  formula  — yxP(x) 
is  demonstrable  can  it  be  shown  that  all  formulas  of  the  form  P(o), 
P(l),  P(2),...  .  are  demonstrable. 

From  the  co-consistency  of  the  formal  arithmetic  there  results  Its 
simple  consistency.  Actually,  let  P  be  any  demonstrable  formula  not 
containing  free  variables  (for  example,  the  formula  0  =  0).  Introduc¬ 
ing  into  this  formula  the  dummy  variable  x,  on  which  P  actually  does 
not  depend,  we  write  it  in  the  form  P(x).  Then  all  the  formulas  P(0), 
P(l),...  coincide  with  P  and,  consequently,  are  demonstrable.  As  a  re¬ 
sult  of  the  co-consistency  of  the  system  this  means  that  the  formula 

-  449  - 


— i VxP{x),  actually  coinciding  with  the  f.  ^Tnula  ”|  P,  is  nondemonstrable. 
However,  with  the  existence  of  contradiction  in  the  sys  em,  thanks  to 
the  property  of  weak  -|  -removal  (see  formula  f7)  of  §5  Chapter  2)  all 
the  formulas  of  this  system  would  be  demonstrable.  Since,  however,  the 
formula  -|  P  is  nondemonstrable,  our  system  is  (simply)  consistent. 

Now  we  can  prove  the  Godel  theorem  on  the  incompleteness  of  the 
arithmetic  in  the  primitive  (weak)  fern. 

Theorem  5.  If  the  formal  arithmetic  is  ^consistent  then  formula 
Vy~Ax{),y),  constructed  above  is  an  example  of  a  nondecidable  formula. 

Proof.  Let  us  assume  first  that  the  formula  Vy~Ax(P,  y)  is  demon¬ 
strable.  We  use  k  to  denote  the  Godel  number  of  the  proof  of  this 
formula.  Then  the  proposition  A(p,  k)  is  true  and,  consequently,  the 
formula  ft)  is  deducible.  Using  the  operation  of  $  -insertion 

(see  §1  of  present  chapter)  we  obtain  \—3yAi(p,  y)  ,  or,  using  formula 
(125),  we  obtain  \-  — 1  Vy^Atf,  y)  .  But  then,  as  a  result  of  the  assump¬ 
tion  made  above,  the  formal  arithmetic  system  which  we  are  considering 
will  be  (simply)  inconsistent,  which  is  excluded  in  view  of  the  condi¬ 
tion  of  its  co-consistency. 

Let  us  now  assume  that  the  formula  ~i  vu~Ax{P,y)  is  demonstrable. 

As  has  been  proved,  the  formula  Vy1Al(p,y)  is  nondemonstrable.  There¬ 
fore  none  of  the  numbers  0,1,2,...  iB  the  Godel  number  of  the  proof  of 
the  latter  formula.  This  means  that  all  the  propositions  A(p,  0), 
a(p,  1),  A(p,  2),...  are  false,  and  consequently,  in  view  of  the  nu¬ 
merical  expressibility  of  the  predicate  A,  all  the  formulas  of  the 
form  —|Ai(p.*)  are  Reducible  for  i  =  0,1,2,...  .  But  then,  on  the 
strength  on  the  assumption  of  the  a>-consistency  of  the  system,  the 
formula  ~~\Vy~ M,(£.  y)  is  nondemonstrable,  which  contradicts  the  assump¬ 
tion  we  made  on  its  demonstrability. 

Thus,  the  formula  Vy~~\Vx^,y)  cannot  be  either  a  demonstrable  or 
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a  nondemonstrable  formula  and,  consequently,  is  an  example  of  the  non- 
decidable  formula.  Thereby  the  theorem  is  proved. 

As  we  see  from  this  proof,  for  the  establishing  of  the  nondemon- 
strability  of  the  formula  which  we  have  constructed  the  assumption  on 
the  simple  consistency  of  the  formal  arithmetic  system  is  sufficient, 
and  the  assumption  on  the  co-consistency  of  the  system  is  used  only  for 
the  proof  of  the  nonrefutability  of  this  formula. 

Rosser  has  shown  [71]  that  we  can  construct  an  example  of  a  form¬ 
ula  whose  nondecidability  is  established  without  the  assumption  on  the 
co-consistency  of  the  formal  arithmetic.  Its  simple  consistency  is  suf¬ 
ficient  for  this.  The  formula  involved  here  is  constructed  as  follows. 
First  from  the  predicates  A(a,  b)  and  B(a,  b)  defined  above  we  con¬ 
struct  the  formula  yy{~ \Ai(x,y)\y  3  2(z<y/\BA*>  *)))(where  the  formulas  A^  and 
B1  numerically  express  the  predicates  A  and  B  respectively).  If  we 
designate  this  formula  by  P^(x)  (q  is  its  Godel  number)  then  the  form¬ 
ula  Pq(§)  is  the  desired  example  of  a  formula  whose  nondecidability 
is  established  with  the  aid  of  the  assumption  on  the  simple  consis¬ 
tency  of  the  formal  arithmetic.  The  validity  of  this  last  assumption 
has  been  established  by  Ackerman,  Neyman  with  a  certain  limitation, 
and  by  Gentzen  in  the  general  case. 

Novikov  [59]  has  shown  not  only  the  simple  consistency  but  even 
the  co-consistency  of  the  arithmetic,  although  to  do  this  required  re¬ 
sort  to  methods  going  beyond  the  framework  of  the  formal  arithmetic 
itself.  From  this  result  and  theorem  5  there  follows  the  Godel  theorem 
on  the  incompleteness  of  the  arithmetic: 

Theorem  6.  The  formal  arithmetic  system  with  the  axioms  1-24  is 
incomplete  in  the  sense  that  in  it  there  are  constant  propositions 
(formulas  which  do  not  contain  free  variables)  which  cannot  be  proved 
or  refuted  using  the  apparatus  of  this  system. 
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We  might  think  that  the  Godel  result  uncovers  only  the  insuffi¬ 
cient  completeness  of  our  selected  axiom  system  for  the  formal  arith¬ 
metic,  and  that  with  suitable  supplementing  of  this  system  of  axioms 
by  new  axioms  the  incompleteness  of  the  arithmetic  (while  retaining 
its  consistency)  would  no  longer  obtain.  In  actuality  the  matter  is 
far  from  geing  this  simple.  As  shown  by  the  detailed  analysis  carried 
out  by  Godel,  with  any  consistent  extension  of  the  axiom  system  the 
formal  arithmetic  continues  to  remain  incomplete,  and  Just  as  before 
there  will  be  in  it  nondecidable  closed  formulas.  Moreover,  every  for¬ 
mal  system  which  satisfies  certain  quite  general  conditions  (the  ex¬ 
istence  of  a  sufficiently  extensive  set  of  formulas  and  objects),  in 
case  of  its  consistency  will  of  necessity  be  incomplete. 

As  mentioned  above,  the  proof  of  the  consistency  of  the  formal 
arithmetic  system  S  which  we  have  constructed  required  resort  to  appa¬ 
ratus  which  goes  beyond  the  framework  of  this  system.  It  Is  found  that 
this  fact  Is  not  chance:  It  can  be  shown  that  the  proof  of  the  consis¬ 
tency  of  the  system  S  by  the  apparatus  formalized  In  this  very  system 
Is  not  possible. 

Actually,  In  the  system  S  it  is  found  to  be  impossible  to  prove 
the  formula  1  =  0.  In  the  case  of  the  inconsistency  of  this  system, 
all  its  formulas  and.  In  particular  the  formula  1  =  0,  become  demon¬ 
strable.  The  reverse  is  also  true:  from  the  demonstrability  of  the 
formula  1=0  there  follows  as  a  corollary  the  inconsistency  of  the 
system  S.  Let  r  be  the  Godel  number  of  the  formula  1=0.  Then  the 
formula  — 1 3yA,(r,0)  ,  which  for  brevity  we  denote  by  A,  as  a  result  of 
the  definition  presented  above  of  the  predicate  A(x,  y)  with  the  nu¬ 
merical  expression  A1(x,  y),  is  the  formal  expression  of  the  consis¬ 
tency  of  the  system  S. 

It  can  be  shown  that  the  formalization  (in  the  system  S)  of  the 
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proof  of  theorem  5  can  be  reduced  to  the  deduction  (in  S)  of  the  form¬ 
ula  ~\Ax($,y)  i  and  the  formalization  (in  S)  of  the  proof  of  the 

consistency  of  the  system  S  can  be  re  uced  to  the  deduction  (in  S)  of 
the  formula  A.  But  in  the  case  of  the  existence  of  both  of  the  indi- 
cated  deductions,  according  to  the  deduction  rule  expressed  by  axiom 
11  of  propositional  calculus  (see  Chapter  2  §5)>  the  formula  vy~Ai(J>,  y) 
must  also  be  deducible  (demonstrable).  Since  this  contradicts  theorem 
5,  then  formula  A  cannot  be  demonstrable  in  the  system  S,  which  then 
shows  the  impossibility  of  the  proof  of  the  consistency  of  the  formal 
arithmetic  system  using  the  apparatus  of  this  system  itself. 

§3.  CONCEPT  OF  AUTOMATION  OF  PROOFS  AND  CONSTRUCTION  OF  DEDUCTIGE 
THEORIES 

The  formal  arithmetic  system  constructed  in  the  preceding  chapter 
is  an  example  of  the  formalization  of  tte  mathematical  theory  on  the 
basis  of  the  predicate  calculus.  Such  formalization  makes  it  possible 
to  expand  into  exactly  defined  elementary  component  parts  the  process 
of  the  proof  of  all  the  propositions  which  are  demonstrable  in  the 
framework  of  the  given  theory.  By  placing  in  the  program  of  a  univer¬ 
sal  electronic  digital  machine  all  the  axioms  and  derivation  rules  of 
the  considered  theory,  and  also  the  formula  expressing  the  proposi¬ 
tion  which  is  to  be  proved,  we  can  organize  a  system  of  random  search 
for  the  proof  of  this  formula. 

If  the  number  of  elementary  steps  which  permit  accomplishing  the 
proof  of  the  required  formula  is  relatively  small,  then  the  high  speed 
of  operation  of  the  electronic  digital  machine  permits  finding  the 
proof  by  the  method  of  simple  sorting  of  all  the  rariants.  However, 
for  any  complex  propositions  such  a  method  of  search  for  the  proof  be¬ 
comes  unsuitable  in  practice  inview  of  the  fact  that  the  number  of 
variants  to  be  sorted  becomes  tremendously  large,  so  that  their  com- 
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plete  sorting  in  a  reasonable  time  is  not  possible  even  on  the  modem 
high-speed  electronic  digital  machines.  In  these  conditions  we  must 
make  use  of  various  sorts  of  techniques  which  permit  a  sharp  reduction 
of  the  number  of  variants  to  be  sorted.  Such  techniques  include  the 
enlarging  of  the  deduction  rules,  thanks  to  which  the  proof  is  con¬ 
structed  from  larger  blocks  and  as  a  result  becomes  considerably 
shorter.  Another  technique  for  the  shortening  of  the  sorting  consists 
in  the  development  of  various  heuristic  methods  which  make  it  possible 
to  set  intermediate  goals  and  thereby  break  the  proof  search  process 
down  into  individual  stages.  Such  stages  must  be  small  enough  so  that 
complete  sorting  within  them  is  possible. 

Usually  in  the  automation  of  proofs  we  prefer  to  make  use  of  a 
formalization  system  of  the  predicate  calculus  which  is  somewhat  dif¬ 
ferent  from  that  which  was  developed  in  the  first  section  of  the  pre¬ 
sent  chapter.  Qentzen  proposed  such  a  system  of  formalization  in  [15]. 
It  permits  the  normalization  in  some  sense  of  the  process  of  the  proof 
using  the  formal  apparatus  of  the  predicate  calculus.  We  shall  present 
certain  basic  these  of  the  Qentzen  system  of  formalization  of  predi¬ 
cate  calculus,  which,  in  contrast  with  the  previously  considered  so- 
called  Hilbert  system  H,  we  shall  designate  by  G  or,  more  precisely, 
by  Ql. 

One  of  the  significant  concepts  in  the  Gentzen  system  is  the  con¬ 
cept  of  the  so-called  sequence.  A  sequence  is  a  formal  expression  of 
the  form  A^,  Ag, . . . ,  B^,  Bg,...,  Bn,  where  A1  and  Bj  are  formulas 

and  the  arrow  denotes  a  new  formal  symbol.  The  sequence  tfIt  , sr* 

-81,  B, . B„  has  the  same  interpretation  as  the  formula  A*f«  A-  A*.  D 

35W®t  V  •••V®,  in  ^he  Hilbert  system,  where  the  conjunction  of  an 
empty  set  of  formulas  is  considered  true,  and  the  dlsj8nction  of  an 
empty  set  of  formulas  is  considered  false. 


The  part  of  the  sequence  standing  ot  the  left  of  the  symbol  -*  Is 
teimed  the  antecedent,  and  the  part  standing  to  the  right  of  this 
symbol  is  termed  the  succeedent  of  the  considered  sequence.  For  brev¬ 
ity  of  writing,  the  finite  sequences  of  formulas  are  denoted  by  the 
capital  Greek  letters  (  r.  0,  A,  A)  etc.)  and  the  individual  formulas  are 
denoted  by  the  capital  Latin  letters. 

In  the  Gentzen  system  G1  there  is  the  natural  axiom  (axiom  scheme) 

c  -+c,  (130) 


and  also  a  whole  series  of  deduction  rules  which  are  divided  into 
rules  of  deduction  for  propositional  calculus  and  additional  rules  for 
deduction  for  predicate  calculus. 


The  deduction  rules  for  propositional  calculus: 

d  O  “insertion  in  succeedent) 

I  -*•  o,  A  13  o 

r -*■  e  (“)  -insertion  in  antecedent) 
ADB,  A.  r-*A.e 

rr^s-i4-a-  (  ^  -inset ion  in  succeedent) 

I  -*■  A  /\  a 

A  Q  A  a'b  r 4~e  (  A  -insertion  in  antecedent) 

r T -» 9. B  (  -insertion  in  succeedent) 
r-v0.i4vflr^e,/<vfl 

*1  r’ ^ ^  -  (  V  -insertion  in  antecedent) 

A  V  o,  1  -*•  o 

-t  q— t  (~ 1  -insertion  in  succedent) 

(~  -insertion  in  antecedent) 
Additional  rules  of  deduction  for  predicate  calculus : 


r-e./i(»)  . 

j-  . 0  I xMx)  (  V  -insertion  in  succeedent) 

Y^xA\x)  r-~e  (^  -insertion  in  antecedent) 

^  -insertion  in  succeedent ( 

A (t), r-g  (_  -insertion  in  antecedent) 

37AW.r-eva 


(131) 

(132) 

(133) 

(134) 

(135) 

(136) 

(137) 

(138) 

(139) 

(140) 

(141) 

(142) 
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In  rules  (139)  and  (142)  there  must  be  observed  a  definite  limita¬ 
tion  which  amounts  to  the  following:  the  variable  b  must  not  occur 
free  in  the  conclusions  (i.e.,  in  the  expressions  under  the  bar)  or 
(139)  and  (142). 

i 

We  note  that  when  formula  A(x)  does  not  actually  contain  the  free 
variable  x,  then  A(b)  coincides  with  A(x).  In  this  case  the  variable 
b  can  be  arbitrary  so  that  as  b  we  can  always  select  a  variable  which 
does  not  occur  in  the  conclusion  and  can  thereby  observe  the  required 
limitation. 

In  addition  to  the  rules  above,  in  the  Gentzen  system  there  are 
seven  more  so-called  structural  rules  of  deduction: 

r  a 

(refinement  in  succeedent)  (143) 

r  e 

0  (refinement  in  antecedent)  (144) 

(abbreviation  in  succeedent)  (l45 ) 

(abbreviation  in  antecedent)  (146) 

(permutation  in  succeedent)  (147) 

-f'S’ff’r  (permutation  in  antecedent)  (148) 

A.o.c.r 

A  -»  A,  C  C.f  -»  9  (section)  (149) 

a,  r  -*■  a,  0 

For  the  designation  of  the  demonstrability  of  the  sequence  '6  in 
the  system  G1  use  is  made  of  the  abbreviated  notation  |-  S,  similar 
to  the  corresponding  notation  in  the  Hilbert  system  H. 

The  Gentzen  system  G1  is  in  a  certain  sense  of  the  word  equiva¬ 
lent  to  the  Hilbert  system  H,  since,  as  shown  by  Gentezen,  the  follow¬ 
ing  theorem  is  valid: 

Theorem  1.  If  the  formula  A  is  deducible  from  the  finite  set  of  / 

formulas  r  in  the  Hilbert  system  H  and  all  the  variables  remain  fixed, 
then  in  the  Gentzen  system  G1  the  sequence  r  -*■  A  is  deducible.  And,  on 
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the  other  hand,  if  in  the  system  G1  the  sequence  r  -►  A  is  deducible, 
then  the  formula  A  is  deducible  from  the  set  of  formulas  r  and  in  this 
case  all  the  variables  remain  fixed. 

The  similarity  between  the  Hilbert  and  the  Gentzen  systems  is  so 
great  that  if  in  the  deduction  performed  in  one  system  use  is  made  of 
the  rules  (axioms)  only  for  a  part  of  the  logical  operations  c,~~ i, V .A.V.H, 
then  in  the  corresponding  deduction  in  the  other  system  we  could  be 
limited  to  only  the  rules  with  the  same  symbols,  with  the  possible 
exception  of  the  implication  symbol  2)  • 

Gentzen  established  a  result  which  makes  it  possible  to  eliminate 
from  the  proofs  in  the  system  G1  the  use  of  the  sections  (deduction 
rule  (149)).  This  is  the  so-called  Gentzen  theorem  on  the  normal  form, 
or  the  elimination  theorem. 

Theorem  2.  Let  in  the  system  G1  there  be  given  the  proof  of  some 
sequence  in  which  no  variable  occurs  free  and  bound  simultaneously. 

Then  in  G1  there  is  a  proof  of  the  same  sequence  which  does  not  use 
the  sections  (rule  (149))  and  uses  only  the  logical  rules  which  were 
used  in  the  orginal  proof. 

Along  with  the  system  Gl,  Gentzen  has  also  constructed  other  for¬ 
mal  systems  (the  systems  G2,  G3). 

Hao-Wang  [77]  has  used  the  Gentzen  system  Gl  for  the  automation 
(with  the  aid  of  a  universal  electronic  digital  machine)  of  the  pro¬ 
cess  of  the  proof  of  a  large  number  of  theorems  not  only  from  proposi¬ 
tional  calculus,  but  also  from  the  (restricted)  predicate  calculus. 

The  experiments  made  by  Hao-Wang  showed  that  in  spite  of  the  absence 
of  a  universal  decision  procedure  for  the  predicate  calculus,  we  can 
construct  a  partial  decision  procedure  which  permits  proof  of  all  the 
theorems  usually  included  in  a  handbook  on  mathematical  logic. 

In  the  case  of  propositional  calculus,  for  the  proof  of  a  partic- 
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ular  sequence  the  deduction  rules  ( 131 )— ( 138 )  of  the  Gentzen  system 
(supplemented  by  the  rules  for  the  insertion  of  the  equivalence  sym¬ 
bol  into  the  succeedent  and  the  antecedent)  are  used  by  Hao-Wang  in 
the  reverse  direction  (the  conclusion  is  replaced  by  the  premise).  In 
this  case  there  is  performed  a  sequential  (beginning  with  the  left  end 
of  the  sequence)  exclusion  of  the  logical  connections.  As  a  result  of 
the  application  of  this  procedure,  after  a  finite  number  of  steps  we 
obtain  the  sequence  of  the  form  A^,  Ag, . . . ,  A^  B^,  Bg,. . . ,  Bn,  where 
A^  and  Bj  are  the  so-called  atomic  formulas,  i.e.,  simply  speaking, 
the  propositional  letters.  Similar  sequences,  which  are  naturally 
termed  elementary,  are  demonsteable  if  and  only  if  in  their  left  and 
right  parts  there  is  encountered  the  same  atomic  formula. 

If  all  the  elementary  sequences  obtained  as  a  result  of  this  pro¬ 
cedure  are  demonstrable,  then  the  original  sequence  is  obviously  de¬ 
monstrable.  For  its  proof  it  is  sufficient  to  repeat  all  the  steps 
which  led  to  the  appearance  of  the  indicated  elementary  sequences,  in 
the  reverse  order. 

If  the  theorem  to  be  proved  is  written  in  the  form  of  a  formula 
in  the  Hilbert  system  H,  then  for  converting  it  to  a  sequence  it  is 
sufficient  to  place  an  arrow  in  front  of  It.  If  the  last  operation 
performed  in  the  original  formula  is  Implication,  the  formula  can  be 
converted  to  a  sequence  by  replacing  the  corresponding  symbol  3  by  an 
arrow.  This  method  of  converting  the  formula  to  a  sequence  usually 
leads  to  a  shorter  proof  than  with  the  writing  of  the  arrow  in  front 
of  the  formula.  As  a  result  of  the  definition  of  the  meaning  of  the 
symbol  In  the  sequence  and  theorem  1  of  the  present  section,  the 
proof  of  the  sequence  obtained  by  either  of  the  two  indicated  methods 
is  also  the  proof  of  the  original  formula. 

Let  us  consider  as  an  exampoe  the  formula  ~(A  of  propo- 
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sltional  calculus.  Replacing  the  implication  symbol  by  the  arrow,  we 
convert  it  to  the  sequence  ~  (A\/B)^~A  .  The  extreme  left  logical 
connective  i3  the  negation  symbol  “|  .  Reversal  of  the  rule  for  “|  -in¬ 
sertion  into  the  antecedent  (rule  ( 138 ) )  brings  our  sequence  to  the 
form  -*”ii4w4\/£  .  Elimination  of  the  following  logical  connective 
(which  is  again  negation)  leads  (with  the  aid  of  the  reversal  of  rules 
(147)  and  (137))  to  the  sequence  A  -*  A\/B  .  Finally,  reversal  of  rule 
(135)  brings  our  sequence  to  the  form  A  “►  A,B,  which  is  an  elementary 
sequence.  Since  the  letter  A  occurs  in  both  the  left  and  right  parts 
of  the  last  sequence  the  sequence  is  demonstrable.  Writing  out  in  the 
reverse  order  all  the  steps  which  led  us  to  the  sequence  A  -+  A,B,  we 
come  to  the  proof  of  the  original  sequence  -|(/t  v B)  -*>  ~~\A  . 

For  the  proper  understanding  of  the  last  step  in  the  described 
example  of  sequential  elimination  of  the  logical  connectives,  we  note 
that  the  deduction  rule  (135)  can  (as  Hao-Wang  does)  be  written 


r-*e.A.£ 
r-+e,A  v  B' 


(150) 


Similarly  In  rule  (134)  for  /\  -Insertion  Into  the  antecedent 
the  two  premises  can  be  replaced  by  one  premise  of  the  form  T,A.B,  —  9 
The  legitimacy  of  these  changes  of  the  rules  (134)  and  (135)  is  easily 
justified  with  the  aid  of  the  rules  for  refinement  in  the  succeedent 
and  antecedent  (rules  (143)  and  (144)). 

Of  course,  for  the  propositional  calculus  we  can  construct  more 
effective  proof  procedures,  however  the  described  proof  is  good  in 
that  it  permits  generalization  to  the  case  of  predicate  calculus.  In 
this  generalization  use  is  made  of  the  technique  of  elimination  of  the 
quantifiers  with  the  aid  of  the  reversal  of  the  deduction  rules  (139)- 
(142),  completely  analogous  to  the  technique  described  above  for  the 
elimination  of  the  logical  connectives  with  the  aid  of  the  reversal 
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of  the  reversal  of  deduction  rules  ( 131 )— ( 138 ) • 

The  decision  procedure  constructed  by  Hao-Wang  encompasses,  nat¬ 
urally,  only  a  part  of  the  formulas  of  predicate  calculus  (since  a 
decision  procedure  does  not  exist  for  predicate  calculus  as  a  whole). 
However,  it  is  sufficient  to  cover  almost  all  the  theorems  of  predi¬ 
cate  calculus  included  in  such  a  major  monograph  as  Whitehead  and 
Russell's  Principia  Mathematics. 

Improvement  of  the  effectiveness  of  the  decision  procedure  is 
achieved  by  ir  ans  of  the  use  of  several  additional  technique.  Among 
these  ichniques  an  important  place  is  occupied  by  the  reduction  of 
the  formulas  to  the  so-called  minisphere  form.  In  contrast  with  the 
prenex  form  in  which  the  region  of  action  of  the  quantifiers  is  the 
maximum  possible,  the  minisphere  form  of  the  formulas  provides  for 
the  gre.ote^t  possible  reduction  of  the  region  of  action  of  the  quan- 
fiflers.  In  the  case  of  the  r.  duct  ion  of  the  formulas  to  the  mini¬ 
sphere  form,  the  operations  of  implication  and  equivalence  are  usually 
first  expressed  by  means  of  the  operations  of  disjunction,  conjunction 
and  negation.  In  this  case  the  concept  of  the  minisphere  form  can  be 
refined  by  means  of  the  following  operations. 

First,  the  individual  propositional  letters  and  elementary  pred¬ 
icates  are  minisphere  formulas.  Second,  if  the  formulas  A  and  B  have 
the  minisphere  form,  then  the  formulas  A\/B,A/\B  and  ~"|A  also  are  minis- 
pheric.  Third,  if  P(x)  is  a  disjunction  (or,  respectively,  a  conjunction) 
or  minispheric  formulas,  then  the  formula  VxP(x)  (or,  correspondingly, 
the  formula  3xP(x )  )  will  also  have  the  minisphere  form.  Fourth,  if  the 
formula  P(x)  in  VxP(x)  (or  Q(x)  in  3xQ(x]  )  begins  with  an  existen- 
sional  quantifier  (or,  respectively,  with  a  generality  quantifier)  and 
the  formula  P(x)  (and  Q(x))  is  minispheric,  then  the  formula  yxP(x) 

(and  3xQ(x)  )  will  also  be  minispheric.  Finally,  fifth,  a  formula 


which  begi  ib  with  a  chain  of  like  quantifiers  has  the  minispheric  form 
if  every  formula  obtained  from  it  by  permutations  of  these  quantifiers 
and  dropping  the  first  of  them  has  the  minispheric  form. 

The  procedure  for  reduction  of  the  formulas  of  predicate  calculus 
to  the  minisphere  form  is  frequently  quite  simple.  In  this  case  it  is 
advisable  to  begin  the  decision  procedure  with  the  reduction  of  both 
parts  of  the  given  sequence  to  the  minisphere  form  and  with  simulta¬ 
neous  elimination  (wherever  this  is  possible)  of  all  the  logical  con¬ 
nectives  with  the  aid  of  the  reversion  of  the  deduction  rules  (131)- 
(138). 

For  the  elimination  of  the  quantifiers,  in  place  of  the  applica¬ 
tion  of  the  (reversed)  deduction  rules  (l39)-(l42)  which  requires  cer¬ 
tain  limitations,  it  Is  frequently  advisable  to  use  a  simpler  method 
based  on  the  concept  on  the  signs  of  the  quantifiers  occurring  In  the 
particular  sequence.  In  the  definition  of  this  concept  we  first  con¬ 
sider  the  question  on  the  assignment  of  signs  to  the  various  parts  of 
the  formulas  of  predicate  calculus.  First  of  all,  each  formula,  con¬ 
sidered  as  an  occurrence  in  itself,  is  regarded  as  positive.  If  P  is 
a  positive  (negative)  part  of  the  formula  Q  or  the  formula  R,  then  P 
will  be  a  positive  (or,  correspondingly,  negative)  part  in  the  form¬ 
ulas  QaR,  QVR,  VxQ,  3xQ  •  If  D  Is  a  positive  (negative)  part  in  the 
formula  S,  then  D  will  be  a  negative  (positive,  respectively)  part  of 
the  formulas  ”|  S  and  S  3  Q,  while  D  will  be  a  positive  (negative,  re¬ 
spectively)  part  of  the  formula  Q  3S. 

If  a  part  of  any  formula  from  the  set  of  formulas  composing  a  se¬ 
quence  is  considered  a  part  of  the  sequence,  then  any  part  in  the  se¬ 
quence  will  have  the  same  sign  as  in  the  corresponding  formula  if  this 
formula  occurs  in  the  succeedent,  and  the  opposite  sign  if  this  formula 
occurs  in  the  antecedent  of  the  considered  sequence.  Every  generality 
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quantifier  in  the  sequence  is  assigned  that  sign  which  its  action  re¬ 
gion  has  in  this  sequence  (considering  the  action  region  as  a  part  of 
the  sequence).  The  signs  of  the  existensional  quantifiers  are  consid¬ 
ered  to  be  opposite  to  the  signa  of  their  action  regions.  For  example, 
in  the  sequence  Vx3yP(x,y),  ~iVtQ(v)  -*  VzR(z)  A  3uS{u)  the  quantifiers  Vz,  Vv 
and  39  are  positive,  while  the  quantifiers  Vx  and  3u  are  negative. 
In  establishing  the  signs  of  the  quantifiers  it  is  necessary  that  all 
the  variables  bound  by  the  quantifiers  be  pairwise  different.  Their 
notations  must  also  differ  from  the  notations  of  all  the  free  vari¬ 
ables.  With  satisfaction  of  these  conditions,  the  following  decision 
procedure  can  be  constructed  for  the  sequences  which  are  in  the  AE- 
form,  i.e.,  consist  of  formulas  in  which  no  existensional  quantifier 
can  include  in  its  action  region  generality  quantifiers. 

First,  all  the  formulas  occurring  in  the  sequence  are  reduced  to 
the  minisphere  form.  Then  with  the  aid  of  the  reversion  of  rules  (131) 
-(138)  we  eliminate  all  the  logical  (propositional)  connectives  which 
permit  such  elimination.  The  resulting  sequences  must  be  in  the  AE- 
form  (since  otherwise  the  original  sequence  would  not  be  an  AE-se- 
quence).  In  all  these  sequences  all  the  quantifiers  are  omitted,  the 
variables  bound  by  the  negative  quantifiers  are  replaced  by  pairwise 
different  numbers,  and  the  vairables  bound  by  the  positive  quantifiers 
are  retained  without  change. 

Again  applying  the  reversion  of  the  rules  (131)-(138),  we  reduce 
the  resulting  sequences  to  the  elementary  form,  i.e.,  to  the  form  not 
containing  either  quantifiers  or  propositional  logical  connectives. 

The  true  elementary  sequences  (i.e.,  those  sequences  in  which  there  is 
at  least  formula  common  to  the  antecedent  and  the  succeedent)  are 
thrown  out.  Performing  all  possible  (not  necessarily  one-to-one)  sub¬ 
stitutions  of  variables  In  place  of  the  numbers  in  the  remaining  se- 
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quence,  we  attempt  to  make  all  of  them  true.  If  this  can  be  done,  then 
the  original  sequence  was  true,  if  not,  then  it  was  false. 

Let  us  consider  as  an  example  the  two  sequences:  VxP(x)  -*>  3yP(y) 
and  ExP(x)  -*>  VyP(y)  •  In  first  sequence  both  quantifiers  are  negative. 
Therefore  the  described  procedure  for  the  removal  of  the  quantifiers 
reduces  it  to  the  form  P(l)  -*•  P(2).  Performing  the  substitution  of  the 

variable  x  in  place  of  the  numbers  1  and  2,  we  transform  the  latter  se¬ 
quence  to  P(x)  -♦  P(x).  Consequently,  the  first  of  the  initially  given 
sequences  is  true.  Both  quantifiers  of  the  second  sequence  are  posi¬ 
tive.  The  procedure  for  the  elimination  of  quantifiers  reduces  it  to 
the  form  P(x)  ■*  P(y),  which  in  the  general  case  (for  any  predicate  P 
and  a  nontrivial  object  region)  is  a  false  sequence.  Consequently,  the 
second  of  theoriginal  sequences  Is  false. 

These  results  coincide  with  the  results  of  the  direct  verifica¬ 
tion  of  the  given  sequences,  which  is  not  difficult  to  accomplish  in 
this  case.  We  note  that  if,  in  spite  of  the  condition  stipulated  above 
In  the  second  sequence  both  bound  variables  were  designated  with  the 
same  letter,  we  would  come  to  an  incorrect  conclusion,  taking  the  se¬ 
quence  to  be  true.  It  is  also  useful  to  note  that  the  described  pro¬ 
cedure,  even  without  the  preliminary  reduction  of  the  formulas  to  the 
minisphere  form.  Is  suitable  for  the  resolution  of  all  AE-sequences 
containing  no  more  than  one  positive  quantifier. 

Along  with  the  decision  procedure  described  above  for  the  propo¬ 
sitional  calculus  (procedure  I),  the  procedure  just  described  without 
the  reduction  of  the  formulas  to  the  minisphere  form  (procedure  II) 
was  programmed  by  Hao-Wang  for  the  IBM-704  universal  electronic  dig¬ 
ital  machine. 

Using  program  I,  the  machine  required  about  three  minutes  to 
prove  all  220  theorems  of  propositional  calculus  composing  the  first 
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five  chapters  of  the  monograph  Principia  Mathematics.  The  total  ma¬ 
chine  operating  time  (with  account  for  the  time  for  entry  of  data  and 
removal  of  results)  amounted  to  about  37  minutes.  Using  program  II, 
after  about  an  hour  of  operation  the  machine  had  proved  about  130  the¬ 
orems  of  predicate  calculus  from  the  158  theorems  constituting  the 
following  five  chapters  of  the  same  monograph.  In  all,  program  II  was 
able  to  prove  139  theorems,  although  the  decision  time  increased  con¬ 
siderably  to  do  this. 

If  we  supplement  procedure  II  with  the  technique  for  the  elimina¬ 
tion  of  quantifiers  on  the  basis  of  the  reversion  of  rules  (139)- 
(142)  and  introduce  into  it  certain  preliminary  transformations  of  the 
formulas  whic.  consititute  the  original  sequence,  then  all  158  of  the 
theorems  indicated  above  become  demonstrable.  The  preliminary  trans¬ 
formations  involved  here  amount  to  the  application  (as  long  as  pos¬ 
sible)  of  the  following  replacement  rules  to  the  formulas  which  make 
up  the  original  sequence: 

Vx(P(x)  A  <?(*))  Is  replaced  by  VxP(x)  a  V xQ(x)\  (151) 
3x(P(x)\/ Q(x))  Is  replaced  by  3xP(x)  \J3xQ(x)\  (152) 

V x  (P( x)  D(Q( x)  A  /?(*)))  is  replaced  by  V x{P(x)  DQ( x))  A 

/\V  x(P(x)3R(x)).  (153) 

These  rules  to  a  certain  degree  replace  the  procedure  for  reduc¬ 
tion  of  the  formulas  to  the  minisphere  form,  which  in  the  general  case 
is  quite  complex.  If  after  the  application  of  these  rules  and  the 
elimination  of  the  logical  connectives  with  the  aid  of  reversion  of 
rules  (133)-(138)  all  the  resulting  sequences  are  AE-sequences  and  in 
addition  are  either  minispheric  or  contain  no  more  than  one  positive 
quantifier,  then  solution  of  the  sequence  can,  as  a  rule,  be  carried 
out  by  procedure  II. 

Hao-Wang  also  proposed  further  Improvements  of  the  described  pro- 


cedures  which  make  it  possible  to  go  beyond  the  limits  of  just  the 
AE-f oimulas .  We  note  that  with  the  aid  of  one  of  these  improved  pro¬ 
cedures  the  IBM-704  machine  carried  out  the  proof  of  350  theorems  fmm 
the  first  nine  chapters  of  Principia  Mathematica  in  8.3  minutes.  The 
procedures  constructed  by  Hao-Wang  can  apparently  be  easily  trans¬ 
formed  into  quasi-decision  procedures  for  the  entire  restricted  pred¬ 
icate  calculus  in  the  sense  that  they  can  (after  suitable  complement¬ 
ing)  prove  any  demonstrable  formula  of  this  calculus  and  can  refute 
"almost  all"  the  nondemonstrable  formulas.  The  expression  "almost  all" 
is  understood  here  in  the  quite  practical  sense  and  cannot,  of  course, 
be  understood  as  "all,  except  for  a  finite  number." 

We  should  underscore  the  difference  between  the  purely  theoretical 
and  practical  approaches  to  the  solution  of  the  problem  of  decidabil¬ 
ity.  In  the  theoretical  aspect  the  prime  importance  lies  in  the  very 
fact  of  the  existence  or  nonexistence  of  the  decision  procedure  for  a 
particular  class  of  formulas.  The  decision  procedures  which  are  con¬ 
structed  for  this  purpose  are  In  the  majority  of  cases  completely  un¬ 
suitable  for  the  automation  of  the  proofs  of  the  theorems,  since  they 
lead  to  excessively  cumbersome  and  lengthy  constructions. 

On  the  other  hand,  In  the  practical  approach  to  the  construction 
of  the  decision  procedures  particular  attention  is  devoted  to  the 
questions  of  the  speed  and  ease  of  performance  of  these  procedures. 

At  the  same  time  we  frequently  reconcile  ourselves  to  the  fact  that 
the  constructed  decision  procedure  does  not  encompass  absolutely  all 
the  formulas  of  the  given  class,  if  with  Its  practical  application  the 
cases  when  it  does  not  give  an  answer  (after  some  predetermined  time) 
are  relatively  infrequent.  Thus,  the  practical  decision  procedures  may 
not  be  in  the  exact  sense  of  the  word  decision  procedures,  but  only 
quasi-decision  procedures. 


-  465  - 


Therefore  it  is  not  surprising  that  in  practice  effective  deci¬ 
sion  procedures  can  be  constructed  not  only  in  decidable  theories, 
but  also  in  undecidable  theories.  We  shouldnot  forget  that  the  human 
being  working  in  a  region  of  some  undecidable  theory  (for  example,  in 
the  arithmetic  of  the  natural  numbers)  makes  use  of  a  finite  (and, 
frequently  not  even  very  large)  number  of  techniques  for  the  perform¬ 
ance  of  the  proofs  and  the  construction  of  counter-examples.  The  task 
of  the  practical  decision  procedures  is  then  to  formalize  these  tech¬ 
niques  . 

Of  course,  the  solution  of  this  problem  is  simplified  if  the  re¬ 
gion  of  application  of  the  decision  procedure  is  limited  ahead  of  time 
to  seme  sufficiently  narrow  region.  At  the  same  time  the  preliminary 
establishing  of  the  theoretical  possibility  of  the  solution  of  the 
problem  of  decidability  in  the  corresponding  region,  generally  speak¬ 
ing,  does  not  simplify  the  problem  of  the  construction  of  the  practical 
decision  algorithm. 

Several  decision  procedures  have  been  constructed  for  the  rel¬ 
atively  simple  branches  of  mathematics  (algebra  of  real  polynomials  , 
elementary  geometry,  theory  of  Abelian  groups  with  a  finite  number  of 
generatrices,  etc.).  However,  these  procedures  were  constructed,  as  a 
rule,  in  the  purely  theoretical  aspect,  and  a  considerable  amount  of 
effort  will  be  required  to  transform  them  into  practical  decision  al¬ 
gorithms  . 

Of  great  interest  is  the  problem  of  the  construction  of  algorithms 
which  would  not  simply  prove  or  disprove  the  propositions  specified 
by  the  human  but  would  themselves  search  out  new  interesting  theorems 
in  a  particular  field.  For  the  construction  of  this  sort  of  algorithm 
it  is  necessary  to  develop  sufficiently  good  criteria  for  the  evalua¬ 
tion  of  the  degree  of  nontriviality  of  a  theorem. 
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One  of  the  first  attempts  in  this  direction  was  made  by  Hao-Wang 
[77]#  who  constructed  a  program  for  the  screening  (with  subsequent 
proof)  of  theorems  in  propositional  calculus.  This  attempt,  however, 
was  not  completely  successful,  in  view  of  the  paucity  of  the  nontriv¬ 
iality  criteria  included  in  the  program:  the  machine  printed  out  too 
large  a  number  of  theorems  without  performing  adequate  screening  of  the 
*  uninteresting  (trivial)  theorems. 

The  first  nontriviality  criterion  which  usually  comes  to  mind  is 
that  the  nontrivial  theorem  must  be  relatively  well  formulated  (be  ex¬ 
pressed  by  a  short  formula)  and  still  not  have  short  proofs.  The  es¬ 
tablishment  of  still  more  natural  criteria  (in  agreement  with  the  con¬ 
ventional  ideas  on  the  nontriviality  of  theorems)  becomes  possible  if 
the  process  of  the  screening  of  new  theorems  and  their  proofs  is  con 
structe  on  the  principles  of  self-organization.  This  can  be  achieved 
by  means  of  supplementing  the  original  axiom  system  by  nontrivial  the- 
I  orems  selected  by  the  program.  It  is  natural  to  evaluate  the  complex¬ 

ity  of  a  theorem  on  the  basis  of  the  minimal  number  of  steps  with  which 
its  proof  can  be  accomplished.  We  term  the  original  axioms  and  all  the 
theorems  whose  complexity  exceeds  some  threshold  which  is  selected  in 
advance  nontrivial  propositions.  Each  newly  proved  nontrivial  propo¬ 
sition  is  adjoined  to  the  axiom  system,  with  the  result  that  a  re- 
evaluation  is  made  of  the  complexity  of  all  the  previously  obtained 
theorems.  Excluding  from  the  axiom  system  the  theorems  which  have  be¬ 
come  trivial,  we  look  for  a  new  nontrivial  theorem,  adjoin  it  to  the 
axiom  system,  again  exclude  theorems  which  have  become  trivial,  etc. 

This  self-organizing  system  for  the  construction  of  formal  de- 
I  ductive  theories  resembles  to  a  considerable  degree  the  process  of  the 

construction  of  such  theories  by  the  human.  We  should  note  that  the 
transition  to  the  processes  of  the  construction  of  the  deductive  the- 
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ories  on  the  self-organization  principles  forces  a  new  approach  to  the 
problem  of  their  decidability.  Of  course,  if  the  indicated  process 
goes  on  isolated  from  the  outside  world,  then  it  is  in  the  final  anal¬ 
ysis  equivalent  to  some  ’’rigid"  (non-self-organizing)  algorithm,  so 

# 

that  in  the  formulation  of  the  decidability  problem  actually  nothing 
is  changed.  The  same  will  obviously  be  true  in  the  case  when  the  pro¬ 
cess  is  influenced  by  some  algorithm  which  is  external  to  it  (the  case 
of  the  "constructive  external  world"). 

In  the  case  of  the  "nonconstructive  external  world"  when  the  ex¬ 
ternal  actions  on  the  process  which  we  are  considering  cannot  be  re¬ 
duced  to  an  algorithm,  the  situation  is  altered  in  principle.  Actually, 
let  us  assume  that  the  process  in  question  can  accumulate  information 
coming  from  the  outside  and  can  perform  the  comparison  of  it  with  the 
formulas  of  the  restricted  predicate  calculus  which  it  has  been  given. 

Let  us  assume  further  that  the  information  arriving  from  the  outside 
consists  of  two  sequences  of  formulas  of  restricted  predicate  calcu-  * 

lus,  arranged  in  the  order  of  increasing  complexity  (evaluated  by  the 
number  of  symbols  making  up  the  formula).  If  the  first  sequence  con- 

l 

tains  all  true,  and  the  second  contains  all  false  formulas  of  predi¬ 
cate  calculsu  (which  is  not  impossble  in  the  case  of  the  "nonconstruc¬ 
tive  medium")  then  it  is  not  difficult  to  construct  a  completely  con¬ 
structive  decision  procedure  for  the  (restricted)  predicate  calculus, 
based  on  the  accumulation  of  an  ever  greater  and  greater  quantity  of 
external  information  and  comparison  of  it  with  the  formulas  which  are 
to  be  resolved. 

It  is  obvious  that  the  "nonconstructive  medium"  is  not  at  all  ob¬ 
ligated  to  completely  take  upon  itself  the  decision  task,  as  was  act-  ' 

ually  the  case  in  the  example  presented.  The  nonconstructive  sequences 
which  it  generates  may  not  even  be  direct  sequences  of  the  formulas. 
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They  must  p<~  s  only  one  characteristic  —  the  possibility  of  their 
constructs  .ansformation  (within  the  framework  of  the  considered 
self-organizing  decision  procedure)  into  suitably  ordered  sequences  of 
demonstrable  (true)  and  nondemonstrable  (false)  formulas  of  restricted 
predicate  calculus. 

It  is  possible  that  these  considerations  may  serve  in  the  future 
as  the  basis  for  systems  for  far-reaching  automation  of  the  processes 
of  scientific  creativity,  principally  the  automation  of  the  process  of 
the  construction  of  complex  deductive  theories. 
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